Nineteen models in seventeen days

On April 1, 2026, there were maybe five frontier AI models worth paying attention to. By April 17, there were nineteen new ones. Claude Opus 4.7, Grok 4.3, Llama 4 in two flavors, Gemma 4, GLM-5.1, Meta Muse Spark, MiniMax M2.7, Qwen 3.6. That is more than one major release per day for seventeen consecutive days. This isn't a story about which model is best. That question stopped being useful somewhere around day five. This is about what happens when the pace of change outstrips everyone's ability to process it, and why the right response for builders is to stop watching the scoreboard entirely.

The seventeen-day pile-up

The compression started before April. OpenAI launched GPT-5.4 on March 5 with Standard, Thinking, and Pro variants. The week of March 10-16 alone saw twelve major model releases from OpenAI, Google, xAI, and others. That density had never happened before. Then April hit, and the pace accelerated. Google DeepMind shipped Gemma 4 on April 2 under an Apache 2.0 license, with models ranging from 2.3B to 31B parameters, all natively multimodal. Codeforces ELO jumped from 110 on Gemma 3 to 2,150 on Gemma 4, a 20x improvement in competitive coding within a single generation. Meta released Llama 4 Scout and Maverick on April 5, the first Llama models to use Mixture-of-Experts architecture. Scout shipped with a 10-million-token context window, the largest of any open-weight model at the time. Zhipu AI released GLM-5.1 on April 7 under the MIT license, claiming it beat both Claude Opus 4.6 and GPT-5.4 on SWE-Bench Pro. On April 8, Meta released Muse Spark, its first proprietary, closed-weight model. MiniMax shipped M2.7 on April 13 with self-evolving capabilities and pricing around $0.30 per million tokens. Moonshot AI released Kimi Code K2.6 Preview the same day. Google dropped Gemini 3.1 Flash TTS on April 15. On April 16, Anthropic shipped Claude Opus 4.7, scoring 87.6% on SWE-bench Verified. Tencent and Alibaba both released World Models on the same day. On April 17, xAI dropped Grok 4.3 Beta, and Alibaba shipped Qwen 3.6-35B. Each of these would have been a headline event in 2024. In April 2026, they blurred together into a single, continuous hum of announcements that nobody could fully absorb.

The eval treadmill is broken

Here is the uncomfortable truth about nineteen models in seventeen days: nobody can properly evaluate any of them. The Artificial Analysis Intelligence Index, which tracks composite benchmark performance across public models, has held at a ceiling of 57 since Gemini 3.1 Pro Preview dropped in February. GPT-5.4 matched it in March. Claude Opus 4.7 extends Anthropic's lead in coding and agentic work but does not break the ceiling. Every lab is sprinting, and none is pulling away. On Visual Capitalist's tracking of the Mensa Norway IQ benchmark, Grok 4.20 Expert Mode and GPT-5.4 Pro tied at the top with scores of 145. The gap between the top dozen models compressed to just a few points. On Humanity's Last Exam, a benchmark designed to be unsolvable by AI, the best models went from 8.8% accuracy in early 2025 to over 50% by April 2026. Impressive in isolation. But when every model claims state-of-the-art performance on different benchmarks measured in different ways, the signal dissolves. Evaluation takes time. Time is exactly what this pace doesn't allow. By the time you benchmark one model, three more drop. By the time you publish a comparison, the comparison is stale. The eval treadmill was designed for a world where new models arrived quarterly. In a world where they arrive daily, the treadmill just spins.

When there are nineteen options, none of them are special

More models should mean more choice. Instead it means more confusion and faster commoditization. The frontier models, GPT-5.4 Pro, Gemini 3.1 Pro, Claude Opus 4.7, are separated by single-digit percentage points on most benchmarks. The practical differences come down to pricing, latency, and which specific tasks a team cares about most. The "best" model changes weekly; your infrastructure shouldn't. This is the logical endpoint of what has been building for months. When intelligence becomes a commodity, the model itself stops being the differentiator. It becomes the engine, and the engine was already good enough months ago. What matters is everything around the engine: integration, tooling, developer experience, reliability, cost management. The open-source ecosystem makes this even more stark. GLM-5.1, released under the MIT license with 744 billion total parameters, reportedly benchmarks above the best publicly available Western frontier model on expert-level software engineering tasks. The cost to self-host? Whatever your electricity bill says. When a free, self-hostable model competes with a $5/$25 per million token API, the pricing power of the closed labs erodes on a timeline measured in weeks, not years.

The price war is already here

Zhipu AI raised GLM-5.1's cloud pricing 8% to 17% above its GLM-5 Turbo predecessor, landing at roughly $1.40 per million input tokens and $4.40 per million output tokens. That sounds like a price hike until you compare it to the Western frontier: Anthropic's Claude Opus 4.7 costs $5 per million input tokens and $25 per million output tokens. OpenAI's GPT-5.4 sits in a similar range. The gap is enormous. And it exists alongside models like MiniMax M2.7 at roughly $0.30 per million tokens, models that are not toy demos but production-viable systems with novel capabilities like self-evolving weights. Meanwhile, Grok 4.3 Beta launched behind a $300 per month SuperGrok Heavy subscription, making it the most expensive consumer AI subscription on the market. The pricing spectrum for frontier AI now stretches from free and open-source to $300 per month, with the actual capability differences between tiers shrinking every week. This is what commodity markets look like. When the product is functionally interchangeable and loyalty evaporates, the only remaining lever is price. Some labs will compete by going lower. Others will try to justify premiums through ecosystem lock-in, exclusive features, or enterprise trust. But the direction is clear: the marginal cost of intelligence is falling toward zero, and the seventeen-day pile-up in April just accelerated the timeline.

What this means for builders

If you are building products on top of AI models, the practical advice is unfashionable but correct: stop chasing the latest model. The cost of perpetual model evaluation is higher than the cost of being on last week's best model. Every week there's a new "best" model. Switching costs are real, different APIs, different context window behaviors, different strengths and failure modes. The coordination cost of switching between tools, remembering which one does what, maintaining different mental models for different interfaces, that eats the productivity gains faster than the new model delivers them. The smarter play is to build workflows that are model-agnostic. Orchestration: build systems that can route between models based on cost, latency, and task type, so you are not locked into any single provider. Reliability: invest in error handling, fallback chains, and monitoring, because a slightly less capable model that never fails beats a marginally better model that goes down at 2 AM. Cost management: treat inference as a variable expense to be managed, not a fixed loyalty to one vendor. Pick a tier, frontier, mid, or small. Pick a provider. Ship. Optimize later. A slightly less capable model with great tooling, reliable context handling, and smooth integration will outperform a marginally better model you have to duct-tape into your stack. The gap between frontier models matters less than the gap between good and bad infrastructure around them.

The compression thesis

The gap between announcement and obsolescence is now measured in days, not months. A model that is state-of-the-art on Monday may be second-best by Friday. This has downstream effects that go beyond benchmarks. For researchers, publishing cycles cannot keep up. A paper analyzing GPT-5.4's capabilities is out of date before peer review. For regulators, policy frameworks designed around annual model assessments are inadequate. The EU AI Act's risk classification system assumes a pace of change that no longer exists. For users, the AI assistant in your phone might meaningfully change its behavior three or four times a quarter, with no changelog you would ever read. And for the companies building these models, the competition is not really about who has the best model at any given moment. It is about who can sustain the pace, who can convert capability into revenue, and who can do both without burning through capital faster than they generate it. Meta is laying off 10% of its workforce while pouring billions into AI infrastructure. Anthropic raises round after round. OpenAI's costs keep climbing. At some point, the economics have to resolve. Either these models start generating enough revenue to justify the investment, or the release cadence slows. The semiconductor supply chain adds its own constraints. But even if consolidation comes, the current moment matters because it is setting the competitive landscape. The companies that establish themselves as essential infrastructure now will have durable advantages when the dust settles. Nineteen models in seventeen days. Not a release cadence. A pile-up. And the most important thing about all of them is that none of them mattered as much as the last one.