Frontier AI Performance Hitting a Wall

Intro

There are reports that progress in frontier development maybe hitting a wall. Open AI’s next flagship model is showing smaller performance improvements compared with previous versions, The Information recently reported. Debates around the feasibility of achieving AGI and AI scaling laws, meaning improvements gained from adding more data and more compute to models without major architectural changes, are taking center stage in Silicon Valley and across the industry as a whole.

Although many had previously sounded the alarm that the ‘exponential’ scaling trend would not continue forever, they were often dismissed as ‘deep learning skeptics.’ One so-called ‘skeptic’ - Gary Marcus – provided the following analogy to describe the current dynamics; “in the long term, science isn’t majority rule. In the end, the truth generally outs. Alchemy had a good run, but it got replaced by chemistry.”

But if these reports are true, and the so called ‘skeptics’ turn out to be right, what does this mean for your organization? Maybe you have invested heavily in AI transformation or are just beginning to explore it. Perhaps you are on the cusp of a major strategic decision and are wondering if this changes things?

We believe that investing an AI strategy is still as important as ever, but if you and your organization are serious about AI transformation, you will need to make strategic adjustments in light of this news.

We can no longer rely on exponential frontier gains to solve today’s problems. Promises about AGI, GPT-5, and other next generation models being free of hallucinations and outperforming humans in many job tasks will likely not pan out.

Let’s take a second to zoom out, look at the market as a whole, and understand how your organization can be successful with AI given the current state of affairs.

‍

Even if Frontier Progress is Hitting a Wall, Cost and Efficiency are Not

Know that there is still lots of progress occurring in areas outside of frontier model development, with big strategic implications for your organization.

We’re frequently taught bigger is better but that’s not necessarily true for AI. It’s all about cost and efficiency. The performance of every model must be considered alongside its training and inference cost.

If model performance improvements were stalling while costs remained flat, then we might have cause for alarm. But actually, the improvements on cost are continuing to drop. This is due to a variety of factors. According to a recent blog from Wing Venture Capital, reasons include:

Market forces and competition

More companies entering the AI space, pricing reflects the competition.

More efficient compute

Hardware and infrastructure optimizations continue to reduce the cost of running inference.

Smaller, smarter, models

Models like GPT-4o mini, Llama 3.1 8B, performing better than GPT-3.5 at a fraction of the size.

New architectures

New architectures and algorithmic optimization require less compute to train and apply models.

Edge inference

Smaller models using new architectures and optimized hardware-software combinations can run efficiently at the edge without burdening the cloud or networks.

These cost and efficiency improvements are extremely important and should not be overlooked. With these changes we can greatly reduce the cost of inference, scale AI systems, and bring AI closer to the edge than ever before.

But what does this mean in the context of the ‘brick wall’,and should we be concerned that these gains will be canceled out?

‍

3 Tips to Make Sure your AI Strategy Doesn’t Falter

The following recommendations may resonate differently depending on your current strategy and where your organization is in its AI transformation, but one thing will hold true for everyone; the hype cycle is winding down and it’s time to be prepared for it.

We can no longer rely on exponential frontier improvements to fix current POC issues. Successful production use cases will require upfront resources and planning.

Lots of promises have been made about the future state of AI and Deep Learning, including reduced hallucinations, the ability to steer models away from bias, and massive gains in the ability to outperform humans in certain cognitive tasks.

Ideally, model gains would allow all our POC to function with little upfront investment. Low code/no code solutions and agents built on top of prompt-engineering, all built for for pennies on the dollar, would function out of the box, without the need for programmers or humans to step into the loop. At least that was what was promised.

Those of us that have banked on these gains coming soon, may need to re-think our use cases, data sources, and engineering resources to ensure we are getting the most value out of AI in the near term.

Current LLM performance is incredibly impressive compared to what we had even 2-3 years ago. Given the rapid advancements in efficiency, we now can get the same level of performance at a fraction of the size and inference cost, allowing us to do things that were previously impossible.

However, small and efficient AI models are not as plug and play as a hypothetical GPT-5 with near AGI-like capabilities. So, it’s a double-edged sword. On the one hand, getting from POC to production will require more upfront resources and planning, but on the other hand, use-cases that successfully make it to production should be more stable, reliable, and cost-effective.

Your data is more important than ever.

We’ve heard many say that structured data no longer matters, that AI and LLMs will be able to synthesize facts and create new information from all data regardless of what it is, how it’s organized, and where it came from. There are some who believe that synthetic data will even eliminate the need for any new data at all, that AI will be able to learn from itself.

To be honest, we never really believed this. But every time we see big gains in deep learning the question of synthetic data seems to always come up.

Not to mention, we have the issue of an increasing portion of data on the web – the same data all the AI labs use to train their models –being generated by LLMs. This includes text, code, images, and more.

It’s been studied that when models are fed increasing amounts of self-generated data they begin to exhibit unstable properties, eventually collapsing altogether. This may explain why some of the latest models are rumored to not be improving, and potentially performing even worse, when fed more data.

Techniques like RAG and fine tuning will become even more important. Data Governance, while slightly overlooked during the hype cycle, will be front and center as organizations realize that those with the highest quality data, best labeling, and structuring of facts will be posed to unlock the most from these technologies.

‍

Standards, processes, and controls around AI adoption will be required to achieve ROI.

Recently, an article in the Associated Press came out highlighting major issues with OpenAI’s whisper tool. The issue? Whisper was inventing chunks of text and even entire conversations some of which even contained “racial commentary, violent rhetoric, and even imagined medical treatments.”

Look closely at the terms of use of your current foundation model provider. You might see something like the following excerpt fromGoogle’s MedLM customer responsibilities section:

‍

“𝗟𝗟𝗠𝘀 𝗮𝗻𝗱 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗜 𝗮𝗿𝗲 𝗶𝗻𝗵𝗲𝗿𝗲𝗻𝘁𝗹𝘆 𝗽𝗿𝗼𝗯𝗮𝗯𝗶𝗹𝗶𝘀𝘁𝗶𝗰 𝗮𝗻𝗱 𝗺𝗮𝘆 𝗻𝗼𝘁 𝗮𝗹𝘄𝗮𝘆𝘀 𝗯𝗲 𝗮𝗰𝗰𝘂𝗿𝗮𝘁𝗲. Without adequate consideration or controls by customers, use of 𝗚𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝘃𝗲 𝗔𝗜 𝗺𝗼𝗱𝗲𝗹𝘀 𝗶𝗻 𝗵𝗲𝗮𝗹𝘁𝗵𝗰𝗮𝗿𝗲 𝗺𝗮𝘆 𝗰𝗼𝗻𝘀𝘁𝗶𝘁𝘂𝘁𝗲 𝗮 𝗵𝗮𝘇𝗮𝗿𝗱 𝘁𝗼 𝗽𝗮𝘁𝗶𝗲𝗻𝘁𝘀 due to inaccurate content, missing content, or misleading, biased content.”

The stark difference between the marketing material and terms of use is sure to cause confusion, however, the truth about what is required to truly implement AI successfully is coming out.

Platforms like Fairo, focused on embedding holistic AIGovernance, Risk Management, and Compliance into AI product lifecycles are wise choice for any organization looking to undergo AI transformation.

Using a platform like Fairo, you can onboard AI vendors with more transparency and consistency knowing that you have thoroughly evaluated them and understand their technology and terms of use. Additionally, lengthy and complex assessments and questionnaires can be split up and sent to relevant stakeholders, greatly reducing the amount of time it takes to evaluate use cases for impacts and risks.

Technical challenges, like performance and hallucinations can be addressed by Fairo as well. Managing testing and change control for all AI models, vendors, and Use Cases centrally will ensure you never deploy a system you haven’t evaluated ever again. And when the tests identify gaps, you can rest assured that the required ‘human-in-the-loop’ controls are implemented and documented.

Handling regulatory compliance around AI use cases, as well as meeting contractual obligations, is straightforward and automated with an AI governance platform like Fairo. Reporting to your stakeholders through high-quality reports and model cards will ensure that no one has any surprises when using an AI system.

Strategic oversight – including elements of costs and ROI – can be centralized with an AI GRC platform like Fairo. This can greatly improve the success of your AI strategy, alongside the operational and financial health of your organization.

Implementing robust and holistic AI GRC goes beyond just regulations, it is a necessary component to using AI consistently, responsibly, and profitably, not just for the wellbeing of your end users but for your business as a whole.

‍

Key Takeaways

Now is not the time to shy away from AI technology. Organizations that have rich data, rich talent, and a strong existing business model have a unique opportunity to pull ahead from the competition and create lasting value.

It’s time to take a breath and realize that your position in the market is not going to be de-throned by someone who knows how to write a clever prompt or a 'wrapper around GPT'. You organization has time to set up the proper controls and governance around AI to ensure consistent, risk assured, and profitable deployments.

If this recent news about frontier models hitting a brick wall tells us anything, it’s that it’s finally time to go back to basics. We’ve had our fun trying every POC we can think of, but if we want our AI solutions to “stick” we have to leverage data, domain experts, and capable teams within our organization to build AI the right way. We need to document, track, and understand the risks, both to individuals and to systems, and make sure we continue to lead in developing valuable, consistent, and responsible AI for all.

‍

AI Performance is Hitting a Wall

Table of Contents

Intro

Even if Frontier Progress is Hitting a Wall, Cost and Efficiency are Not

Market forces and competition

More efficient compute

Smaller, smarter, models

New architectures

Edge inference

3 Tips to Make Sure your AI Strategy Doesn’t Falter

We can no longer rely on exponential frontier improvements to fix current POC issues. Successful production use cases will require upfront resources and planning.

Your data is more important than ever.

Standards, processes, and controls around AI adoption will be required to achieve ROI.

Key Takeaways

AI Performance is Hitting a Wall

Table of Contents

Get the latest updatesin your inbox.

Intro

Even if Frontier Progress is Hitting a Wall, Cost and Efficiency are Not

Market forces and competition

More efficient compute

Smaller, smarter, models

New architectures

Edge inference

3 Tips to Make Sure your AI Strategy Doesn’t Falter

We can no longer rely on exponential frontier improvements to fix current POC issues. Successful production use cases will require upfront resources and planning.

Your data is more important than ever.

Standards, processes, and controls around AI adoption will be required to achieve ROI.

Key Takeaways

Get the latest updates in your inbox.

Also Read

How to Ensure Your AI Strategy is Successful

What is AI Governance?

Developing an AI Governance Strategy and Framework

Get the latest updates
in your inbox.