10 trillion parameters of nothing
Anthropic just leaked Claude Mythos 5. Ten trillion parameters. That's roughly 10x the rumored size of GPT-4, and a number so large it exists primarily to impress people who confuse magnitude with meaning. Google responded with Gemini 3.1. The parameter arms race rolls on, each new release accompanied by breathless coverage and benchmark charts trending upward. But here's the question nobody in the hype cycle wants to sit with: can you actually tell the difference? In a typical conversation, a typical coding task, a typical business workflow, does a model with 10 trillion parameters produce output that's meaningfully better than one with 70 billion? The honest answer, for most use cases, is no. And that gap between expectation and experience is where the real story lives.
The biggest number wins
The AI industry has a scaling addiction. For years, the playbook was simple: more parameters, more data, more compute, better results. OpenAI's landmark 2020 paper by Jared Kaplan and colleagues formalized this into scaling laws, showing that model performance improves predictably as you increase these inputs. The formula worked. It powered the leap from GPT-3 to GPT-4, arguably the most impressive single jump in AI capability the public has ever witnessed. So the industry kept pushing. More GPUs. More training data. More billions of dollars in infrastructure. And when Anthropic accidentally exposed 3,000 unpublished assets about Claude Mythos 5, a 10-trillion-parameter model described internally as "by far the most powerful AI model we have ever developed," the narrative seemed vindicated. Bigger is better. The curve only goes up. Except it doesn't. Not like it used to.
The diminishing returns are already here
The uncomfortable truth is that pure pre-training scaling, the strategy that produced GPT-4, has already hit a wall. Research from multiple directions confirms this. A study published in PNAS found that language model persuasiveness shows "sharply diminishing returns" with scale, and that once you adjust for basic task completion, the association between model size and performance "shrinks toward zero." Research compiled by BuildFastWithAI shows that different capabilities plateau at different parameter counts: knowledge tasks around 30 billion parameters, reasoning around 70 billion, code generation around 34 billion, and language understanding as early as 13 billion. Only creative tasks continue to benefit significantly from larger scales. GPT-5's launch in mid-2025 was the canary in the coal mine. Cal Newport, writing in The New Yorker, described the improvements as "more like the targeted improvements you'd expect from a software update than like the broad expansion of capabilities in earlier breakthroughs." Users were less diplomatic. Gary Marcus called it "overdue, overhyped and underwhelming." OpenAI's internal project Orion, which was supposed to be a blockbuster successor to GPT-4, reportedly disappointed. According to The Information, "the increase in quality was far smaller compared with the jump between GPT-3 and GPT-4." Even Elon Musk's xAI tried brute-forcing ahead with Grok 3, training on roughly 100,000 H100 GPUs, many times the compute used for GPT-4. It didn't significantly outperform competitors. The pattern is consistent. Each generation costs exponentially more and delivers incrementally less. The curve is flattening, and no amount of marketing can disguise that.
Ten trillion parameters for whom?
Here's what the "10 trillion parameters" framing obscures: who actually benefits from a model this large? Not individual developers. A model with 10 trillion parameters can't run locally. It can't run on a single GPU cluster. It exists exclusively behind an API, controlled by the company that built it, priced at whatever they decide. The cost cascade is real: bigger models require more compute, which means higher API prices, which means a higher barrier to entry for startups and independent builders. The AI API market in 2026 already illustrates this stratification. Frontier model pricing from OpenAI and Anthropic runs orders of magnitude higher than mid-tier or open-source alternatives. For a startup processing thousands of requests daily, the difference between calling a 10-trillion-parameter model and a fine-tuned 7-billion-parameter model isn't just a line item. It's the difference between a viable business and burning through runway. Meanwhile, small language models keep closing the gap. Research shows that SLMs deliver 80 to 95 percent fewer computational requirements than large models while achieving competitive performance on focused tasks. A study comparing small and large models for requirements classification found that SLMs "almost reach LLM performance across all datasets and even outperform them in recall," despite being up to 300 times smaller. For most production use cases, a well-tuned small model isn't a compromise. It's the right tool. So when Anthropic announces 10 trillion parameters, the question isn't whether the model is impressive. It's whether the impressiveness translates to anything that matters for the people who actually build things.
The real innovation is making models smaller
While the headlines chase parameter counts upward, the most consequential work is moving in the opposite direction. In March 2026, Google Research released TurboQuant, a compression algorithm that reduces KV cache memory by 6x and delivers up to 8x faster attention computation on H100 GPUs, with zero accuracy loss. No retraining required. No fine-tuning. It works on existing models out of the box. Google released it as open research, free for anyone to use. The market reaction was telling. Memory chip stocks dropped immediately. If a software breakthrough can eliminate 6x of your hardware demand overnight, the entire "we need more HBM" narrative starts to crack. Morgan Stanley analysts noted that while core GPU memory remains necessary, the efficiency gains from techniques like TurboQuant could fundamentally reshape infrastructure economics. This is the real frontier. Not making models bigger, but making existing models dramatically more efficient. DeepSeek showed you could build competitive models at a fraction of the cost. TurboQuant showed you could run them at a fraction of the memory. The trajectory isn't toward 100 trillion parameters. It's toward making 7 billion parameters do what 70 billion used to. And that's a trajectory that actually helps people build things.
The cybersecurity paradox
Anthropic is reportedly positioning Claude Mythos 5 for "advanced cybersecurity," among other use cases. It's a reasonable pitch. Larger models with broader training data could theoretically identify more complex attack patterns, analyze more sophisticated threats, and reason about security architectures with greater depth. But there's a paradox embedded in this framing. Bigger models also create bigger attack surfaces. A 10-trillion-parameter model with broad capabilities is a more powerful tool for adversaries, not just defenders. The same reasoning ability that helps identify vulnerabilities can be used to exploit them. The same code generation capacity that helps patch software can help craft malware. This isn't a hypothetical concern. As models grow more capable, the dual-use problem intensifies. The companies building these models know this, which is why Anthropic described Mythos as potentially "too dangerous to release." But the leak itself demonstrated the fragility of containment. If a model this powerful can't be kept secret before launch, the security story around its deployment deserves scrutiny. The cybersecurity value of a 10-trillion-parameter model isn't zero. But it's not obviously better than a well-deployed ensemble of smaller, specialized models, each tuned for a specific security domain, easier to audit, and presenting a narrower attack surface.
A number designed for investors, not users
"Ten trillion parameters" is not a technical specification aimed at developers. It's a marketing number aimed at investors and media. It exists in the same category as megapixel counts on cameras and GHz numbers on processors, a metric that once correlated with meaningful improvement but has long since decoupled from the user experience. The AI industry is locked in a signaling game. Each lab needs to demonstrate that it's pushing the frontier, because that's what attracts funding, talent, and enterprise contracts. Parameter count is the simplest way to signal progress, even when the actual capability gains are modest. Anthropic called Mythos "a step change." Morgan Stanley called the scaling trajectory a reason to deploy capital at unprecedented scale. The narrative feeds itself. But the people building products on these models tell a different story. The gap between the best model and the fifth-best model matters far less than it did two years ago. Multiple labs produce models of roughly comparable capability. Prices are falling. Access is democratizing. Intelligence is becoming a commodity. What isn't commoditizing is knowing what to do with it. The ability to identify the right problem, design a system that solves it, and ship something people actually want to use. No amount of parameters produces that.
The practical test
Here's an exercise worth trying. Take a real task you do regularly, something concrete like summarizing a document, writing a function, drafting an email, or analyzing a dataset. Run it through a 7-billion-parameter open-source model. Then run it through whatever frontier API you have access to. For most tasks, the difference will range from negligible to nonexistent. The small model might be slightly less polished. It might miss an edge case. But it will complete the task, often in a fraction of the time and at a fraction of the cost. Now multiply that comparison by a thousand. That's a product. And for the product, the 7B model wins, because it's faster, cheaper, and you control it. The marginal quality improvement from a model 1,000 times larger doesn't justify the cost, latency, and dependency tradeoffs for the vast majority of real-world applications. This isn't an argument against frontier research. Pushing the boundaries of what's possible matters enormously for science, for safety research, and for the small percentage of use cases that genuinely require maximum capability. But for the 95 percent of AI applications that are being built right now, 10 trillion parameters is a solution to a problem nobody has.
What actually matters
The parameter arms race is a spectator sport. It's exciting to watch, fun to debate, and almost entirely irrelevant to the work of building useful AI systems. What matters is reliability, making sure your system works the same way on the thousandth request as it did on the first. What matters is evaluation, knowing whether your system is actually working. What matters is user experience, designing interfaces that make AI useful rather than frustrating. What matters is cost efficiency, building something that's economically sustainable at scale. None of these improve with more parameters. They improve with better engineering, clearer thinking, and the judgment to know which problems are worth solving. Anthropic built a 10-trillion-parameter model, and it might genuinely be impressive. But the number on the box tells you almost nothing about whether it will help you build something that matters. The era of "bigger is better" served the industry well for a few years. It produced genuine breakthroughs. But we're past the inflection point now, and clinging to the old playbook is a sign that the industry is running out of new ideas, not generating them. Ten trillion parameters. And for most of us, not a single one that changes what we build tomorrow.
References
- Kaplan, J. et al., "Scaling Laws for Neural Language Models," OpenAI, January 2020. Link
- "Scaling language model size yields diminishing returns for single-message political persuasion," PNAS, 2025. Link
- "LLM Scaling Laws Explained: Will Bigger AI Models Always Win?," BuildFastWithAI, 2026. Link
- Newport, C., "What if A.I. Doesn't Get Much Better Than This?," The New Yorker, August 12, 2025. Link
- Marcus, G., "Confirmed: LLMs have indeed reached a point of diminishing returns," November 2024. Link
- "Exclusive: Anthropic acknowledges testing new AI model representing 'step change' in capabilities," Fortune, March 26, 2026. Link
- "TurboQuant: Redefining AI efficiency with extreme compression," Google Research, March 24, 2026. Link
- "Google's TurboQuant AI-compression algorithm can reduce LLM memory usage by 6x," Ars Technica, March 2026. Link
- "Anthropic Claude Mythos AI: 10-Trillion Parameter," Geeky Gadgets, 2026. Link
- "New AI Model Releases News, April 2026," Mean CEO. Link
- "The AI Industry's Scaling Obsession Is Headed for a Cliff," WIRED, October 15, 2025. Link
- "Does Model Size Matter? A Comparison of Small and Large Language Models for Requirements Classification," arXiv, 2025. Link
- "Small Language Models vs Large Language Models: Key Advantages for Engineering Teams," Augment Code, October 2025. Link
- Zeff, M., "AI Scaling Laws Are Showing Diminishing Returns, Forcing AI Labs to Change Course," TechCrunch, November 20, 2024. Link