Anthropic built a god and locked the door
In late March 2026, Anthropic accidentally leaked internal documents describing a model they called Claude Mythos, internally codenamed Capybara. It was the first AI model to cross the 10-trillion-parameter threshold, a staggering leap from the estimated 1.8 trillion parameters of GPT-4 just two years earlier. Within days, cybersecurity stocks posted one of their worst single-day sell-offs in recent memory. But the real story wasn't the leak. It was what Anthropic did next: nothing. Or rather, the deliberate, calculated opposite of what every other frontier lab has done when sitting on a capability breakthrough. They refused to ship it. On April 7, 2026, Anthropic formally announced Claude Mythos Preview alongside Project Glasswing, a tightly controlled initiative giving roughly 50 organizations access to the model for defensive cybersecurity work. No public API. No general availability. No launch day fanfare. In an industry defined by the race to release, Anthropic chose to hold its most powerful creation behind closed doors.
The model that broke the framework
Anthropic's Responsible Scaling Policy (RSP) defines AI Safety Levels, or ASLs, modeled after biosafety levels used for handling dangerous biological materials. Each level corresponds to a threshold of capability that triggers increasingly strict safeguards. ASL-2 covers current commercially deployed models. ASL-3 applies when a model could provide "meaningful uplift" to attackers in dangerous domains. ASL-4, which had never been triggered before, was designed for models representing a qualitative leap in autonomous capability. Claude Mythos Preview triggered ASL-4. The system card tells the story in clinical detail. When tested against roughly 7,000 entry points across a thousand open-source repositories from the OSS-Fuzz corpus, previous frontier models like Opus 4.6 managed a single tier-3 crash each. Mythos Preview achieved 595 crashes at tiers 1 and 2, added crashes at tiers 3 and 4, and accomplished full control flow hijack, the most severe category, on ten separate, fully patched targets. According to Anthropic, more than 99 percent of the thousands of vulnerabilities Mythos identified remain unpatched, many having gone unnoticed for decades. This isn't a model that's marginally better at finding bugs. It's a model that can autonomously discover and develop exploits for zero-day vulnerabilities across every major operating system and web browser. The gap between Mythos and its predecessors isn't incremental. It's categorical.
Project Glasswing and the logic of restraint
Instead of a product launch, Anthropic assembled a coalition. Project Glasswing partners include Amazon Web Services, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorganChase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks, along with roughly 40 additional organizations responsible for critical software infrastructure. Their mandate: use Mythos to scan and secure systems, share findings with the wider industry, and study the model's implications before any broader deployment. The logic is straightforward, at least on the surface. If a model can find vulnerabilities faster than any human team, you want defenders to have it first. You patch the holes before the capability proliferates. It's the AI equivalent of responsible disclosure in security research, scaled up to an unprecedented degree. But there's a tension embedded in this approach. By concentrating access among a handful of elite organizations, Anthropic is creating an asymmetry. The companies inside Glasswing get to harden their systems. Everyone else has to wait, trusting that the benefits will trickle down through shared learnings and patched code. Whether that trust is warranted depends on how seriously these partners take the "share findings" mandate and how quickly patches actually reach production.
Safety leadership or strategic positioning?
The cynical reading is obvious: Anthropic can't yet monetize Mythos at scale without catastrophic liability risk, so they've reframed restraint as virtue. By positioning themselves as the responsible lab, they earn regulatory goodwill, attract safety-conscious enterprise customers, and differentiate from competitors who ship first and ask questions later. There's probably some truth to this. Anthropic is reportedly valued at around $61.5 billion. Walking away from the revenue a 10-trillion-parameter model could generate isn't just a safety decision, it's a business calculation. The company needs to believe that the long-term value of trust exceeds the short-term value of an API launch. But dismissing the decision as pure strategy misses something important. Anthropic's RSP was published years before Mythos existed. The ASL framework was designed precisely for this moment, a model capable enough to trigger the highest safety classification. When that moment arrived, they followed through. In an industry where safety commitments are routinely treated as marketing copy, actually honoring a self-imposed constraint is noteworthy, even if the motives are mixed. The more interesting question isn't whether Anthropic is sincere. It's whether sincerity matters when the structural incentives point in the same direction.
The Economist and the "five geeks" problem
The Economist's April 16 cover story framed the situation with characteristic drama: "America wakes up to AI's dangerous power." The accompanying rhetoric about "five geeks with godlike command" captures a real anxiety but oversimplifies the reality. The "five geeks" framing suggests a small group making unilateral decisions about technology that affects everyone. That's partially accurate, but it elides the fact that these decisions are happening in a specific institutional context. Anthropic has external review processes, published risk reports, and a compliance framework. These aren't perfect mechanisms, but they're not nothing either. The deeper problem the framing points to is legitimate: there's no democratic process for deciding whether a model like Mythos should exist, who should have access to it, or what safeguards are sufficient. Anthropic's internal governance, however sophisticated, is still a private company making decisions about public risk. The question isn't whether they're making those decisions well. It's whether any private company should be making them at all.
The open-source shadow
While Anthropic locks Mythos away, the rest of the AI ecosystem is moving in the opposite direction. DeepSeek V4, a 1-trillion-parameter open-weights model, dropped in early 2026 and reportedly matches frontier closed models on several benchmarks. Meta's Llama continues to iterate. Google released Gemma for consumer hardware. The gap between frontier and open-source models, once measured in years, has shrunk to months. This creates a paradox for Anthropic's approach. Withholding Mythos only works as a safety measure if the capability remains scarce. But the underlying techniques, architectures, and training approaches that made Mythos possible aren't secrets. They're emerging from research labs and open-source projects worldwide. If another lab, or an open-source effort, reaches similar cybersecurity capabilities without the same safety infrastructure, Anthropic's restraint could look less like leadership and more like unilateral disarmament. The counterargument is that the specific combination of scale, training data, and fine-tuning that produces Mythos-level cybersecurity capability may be harder to replicate than general language capabilities. A model that matches GPT-4 on coding benchmarks isn't necessarily one that can achieve tier-5 control flow hijacks on patched software. The gap between "good at code" and "can autonomously discover zero-days" might be wider than the benchmark leaderboards suggest. Still, the clock is ticking. Every month that Mythos remains locked away is a month where the broader ecosystem moves closer to similar capabilities without the same guardrails.
The competitive calculus
For Anthropic's business, the decision is a double-edged sword. On one side, Glasswing partners like CrowdStrike and Palo Alto Networks gain a genuine competitive advantage in their core markets, strengthening their relationship with Anthropic and validating the model's capabilities. The exclusive access creates a moat that enterprise customers may find compelling. On the other side, Anthropic simultaneously released Opus 4.7, a publicly available model that improves on Opus 4.6 in software engineering and vision tasks. The message to the market is clear: we'll give you the best model we can responsibly ship, even as we hold back the one we can't. It's a sophisticated positioning play, establishing a ceiling of capability that competitors know exists but can't access. The risk is that competitors decide not to follow the same playbook. OpenAI, Google, and xAI are all pursuing frontier models aggressively. If one of them reaches comparable capabilities and decides to ship anyway, Anthropic's restraint becomes a competitive disadvantage without a corresponding safety benefit. The precedent only holds if others adopt it.
The precedent that matters
This is ultimately what makes the Mythos decision significant beyond Anthropic's own story. It's the first time a frontier AI lab has built a model, acknowledged it's the most powerful thing they've created, and explicitly chosen not to release it on the grounds that the risk outweighs the benefit. That precedent cuts both ways. If it holds, if other labs adopt similar frameworks and follow through when their models cross safety thresholds, it establishes a norm of restraint that could meaningfully reduce catastrophic AI risk. It proves that the safety rhetoric the industry has been selling for years can translate into operational decisions. If it doesn't hold, if the next lab to build something comparable ships it anyway, the Mythos episode becomes a footnote. A moment when one company tried to set a standard and the market refused to follow. The honest answer is that we don't know which way it goes. What we do know is that the gap between what labs can build and what they're willing to release is now wider than it has ever been. That gap is where the most consequential decisions in AI are being made, not in the architecture papers or the benchmark scores, but in the rooms where someone has to decide whether to press the launch button. Anthropic built something extraordinary and chose not to let it loose. Whether that choice shapes the industry or gets swallowed by it will tell us more about the future of AI than any parameter count ever could.
References
- Anthropic, "Claude Mythos Preview System Card" (April 2026), anthropic.com
- Anthropic, "Project Glasswing: Securing critical software for the AI era" (April 2026), anthropic.com
- Anthropic, "Claude Mythos Preview," red.anthropic.com (April 2026), red.anthropic.com
- Anthropic, "Responsible Scaling Policy Version 3.0" (February 2026), anthropic.com
- Anthropic, "Introducing Claude Opus 4.7" (April 2026), anthropic.com
- The Economist, "America wakes up to AI's dangerous power" (April 2026), economist.com
- Fortune, "Anthropic is giving some firms early access to Claude Mythos to bolster cybersecurity defenses" (April 2026), fortune.com
- Politico, "Anthropic's AI sparks concerns over a new national security risk" (April 2026), politico.com
- PBS NewsHour, "Anthropic's powerful new AI model raises concerns about high-tech risks" (April 2026), pbs.org
- Arctic Wolf, "Project Glasswing Marks a Turning Point for Cybersecurity" (April 2026), arcticwolf.com
- Stanford HAI, "The 2026 AI Index Report: Technical Performance," hai.stanford.edu