10 trillion parameters and nowhere to go
Anthropic built what is reportedly the first 10-trillion-parameter language model, and then decided not to release it. Claude Mythos Preview, announced on April 7, 2026, is the most capable AI system the company has ever produced. It is also, by Anthropic's own assessment, the first model to trigger ASL-4, the highest tier of their Responsible Scaling Policy. Instead of a public launch, the company restricted access to a handful of partner organizations through a new initiative called Project Glasswing. This is not the first time an AI lab has called something "too dangerous to release." But this time, unlike when OpenAI pulled the same move with GPT-2 in 2019, there are receipts.
What Mythos actually does
The standout capability is cybersecurity. During internal testing, Mythos Preview autonomously discovered thousands of high-severity, previously unknown vulnerabilities in real-world software. These weren't contrived test cases. Anthropic's red team reported finding zero-day flaws in every major operating system and web browser, with some vulnerabilities dating back more than two decades. On Cybench, a standardized benchmark for AI cybersecurity capabilities, Mythos achieved a 100% pass rate across all tested challenges. The model didn't just find bugs. It demonstrated the ability to develop working exploits, evade established defenses like sandboxing and memory protection, and chain multiple vulnerabilities together in ways that previously required teams of elite human researchers. Anthropic described this as a "watershed moment" for cybersecurity. Mythos is also a general-purpose frontier model with strong performance across reasoning, coding, and agentic tasks. But it's the security capabilities that forced Anthropic's hand.
What ASL-4 actually means
Anthropic's Responsible Scaling Policy (RSP) defines a ladder of AI Safety Levels. Each level corresponds to a set of capability thresholds and the safeguards required to match them. ASL-2 covers current standard models and aligns roughly with Anthropic's White House voluntary commitments from 2023. ASL-3, activated for Claude Opus 4 in mid-2025, introduces stricter requirements: enhanced security against model weight theft, adversarial red-teaming by world-class experts, and a commitment not to deploy models showing meaningful catastrophic misuse risk. ASL-4 goes further. In Anthropic's original framing, ASL-4 applies when "AI models have become the primary source of risk in a major area (such as cyberattacks or biological weapons)." The RSP speculates that ASL-4 models may require safety cases built on mechanistic interpretability, AI control protocols, or formal incentive analysis, none of which are fully mature today. With Mythos, Anthropic is essentially saying: we built a model that has become the primary source of risk in cybersecurity, and we don't yet have safeguards robust enough to deploy it publicly.
The Glasswing compromise
Rather than shelving Mythos entirely, Anthropic created Project Glasswing, a consortium of technology and financial companies given controlled access to the model for defensive purposes. The founding members include Google, Microsoft, JPMorgan Chase, CrowdStrike, Apple, AWS, Cisco, and Linux kernel maintainers. The logic is straightforward. Mythos has already found the vulnerabilities. Those flaws exist whether or not anyone looks for them. By giving defenders a head start, Anthropic argues it can extract the security benefits while limiting offensive misuse. Glasswing partners are using the model to identify and patch critical infrastructure vulnerabilities before similar capabilities proliferate to less responsible actors. Anthropic has also been in discussions with US government officials about the model's implications for national security, framing the initiative as a reason why democratic countries need to maintain a lead in AI development.
Meanwhile, OpenAI ships GPT-5.4
The contrast with OpenAI is striking. GPT-5.4, released on March 5, 2026, is a powerful model in its own right. It introduced native computer use capabilities, a million-token context window, and scored above measured human performance on the OSWorld desktop navigation benchmark at 75% (compared to the human baseline of 72.4%). It absorbed the coding capabilities of GPT-5.3-Codex and improved on reasoning, tool use, and professional workflows. OpenAI shipped all of this to the public without hesitation. This isn't necessarily irresponsible. GPT-5.4 is a different model with different capabilities. It wasn't specifically evaluated against the same cybersecurity benchmarks where Mythos excelled. But the optics are hard to ignore. One company says "this is too powerful to release" while another ships something comparably advanced and says "here you go." The divergence reveals a real philosophical split in the industry. OpenAI has moved toward a model of broad deployment with iterative safety improvements. Anthropic has committed to a framework where capability thresholds trigger hard stops. Both approaches carry risks.
The GPT-2 precedent, and why this is different
In February 2019, OpenAI announced that GPT-2 was too dangerous to release because it could generate convincing fake text. The AI community largely rolled its eyes. Two graduate students recreated the model within months. OpenAI eventually released it fully. The episode is remembered more as a marketing stunt than a genuine safety intervention. Claude Mythos invites the same cynicism. "We built something too powerful" is, as the author's notes suggest, incredible marketing. It positions Anthropic as both the most capable and the most responsible lab simultaneously. But there are meaningful differences this time. GPT-2's alleged danger was speculative, a language model that could write convincing paragraphs at a time when that felt novel. Mythos's danger is empirical. Thousands of real vulnerabilities in production software, documented, verified, and in some cases already being patched through coordinated disclosure. The system card, the red team report, and the Glasswing program all provide evidence that goes well beyond a press release. That said, the skepticism isn't unwarranted. The decoder article comparing the two episodes noted that while the evidence is stronger this time, the playbook is familiar. And Anthropic has faced criticism for quietly adjusting its RSP commitments when they proved inconvenient, as tracked by the Effective Altruism Forum and others.
The paradox of voluntary restraint
Anthropic's decision puts it in a strange position. By withholding Mythos, the company simultaneously appears more trustworthy and more dangerous. More trustworthy because it's demonstrating that internal safety protocols can actually bite. More dangerous because the very act of withholding implies the model is so capable that releasing it would be reckless. This paradox has historical roots. During the Cold War, the voluntary restraint of nuclear capabilities, from the NPT to unilateral moratoriums, served both genuine security purposes and strategic ones. A country that visibly chooses not to deploy a weapon signals both restraint and the fact that it possesses the weapon in the first place. The restraint is inseparable from the flex. The gain-of-function research moratorium offers another parallel. In 2014, the US NIH imposed a pause on federally funded research that enhanced the transmissibility or pathogenicity of dangerous viruses. The moratorium was lifted in 2017 with a new review framework, then effectively re-imposed by executive order in 2025. The cycle of pause, review, resume, and re-pause reflects the difficulty of maintaining restraint in competitive research environments where the knowledge itself keeps advancing. Anthropic faces the same fundamental problem. Even if it withholds Mythos indefinitely, the capabilities will eventually appear in other models. The question is whether the pause buys enough time for defenses to catch up.
Is withholding a competitive moat?
There's a less charitable reading of the situation. Voluntary withholding could function as a competitive strategy rather than pure safety concern. First, it generates enormous attention. The Fortune exclusive about Mythos's existence, broken after a data leak in late March, and the subsequent coverage from the New York Times, BBC, PBS, and others gave Anthropic a level of media exposure that no product launch could match. Second, it reinforces Anthropic's positioning as the "safety-first" lab, which is directly relevant to its upcoming IPO. Fortune explicitly connected the Mythos announcement to the company's financial trajectory. Third, Project Glasswing creates a privileged inner circle of partners who get access to capabilities that competitors and the public do not. This is a distribution strategy, not just a safety measure. None of this means the safety concerns are fabricated. It's entirely possible, even likely, that Anthropic is both genuinely worried about the model's capabilities and strategically leveraging that worry. The two motivations aren't mutually exclusive.
Who decides the threshold?
The deeper question Mythos raises is about governance. Anthropic's RSP is a voluntary, self-imposed framework. The company defines the capability thresholds, evaluates its own models against them, and decides when to withhold or release. There is no external body with the authority to independently verify these assessments or mandate specific actions. This is the fundamental limitation of voluntary safety commitments in AI. They work only as long as the company chooses to honor them. Critics on the Effective Altruism Forum have noted that Anthropic has already adjusted the specificity of its RSP commitments over time, from detailed "warning sign evaluations" promised in the 2023 version to more flexible language in subsequent updates. The SANS Institute convened an emergency session on Mythos, and cybersecurity experts have called for independent verification of Anthropic's claims. Some have questioned whether the vulnerability counts are inflated or whether the demonstrated capabilities represent a genuine step change versus incremental improvement. As AI capabilities continue to advance, the question of "too capable to deploy" will come up again. Next time, it might not be a company that believes in safety protocols making the call.
What comes next
Anthropic released Claude Opus 4.7 on April 16, just nine days after the Mythos announcement. This model is publicly available and explicitly described as having had its cybersecurity capabilities "differentially reduced" during training. It's a signal that the company views Mythos as a special case, not the beginning of a permanent lockdown. The real test will be whether Glasswing actually delivers on its defensive promise. If the coordinated disclosure effort leads to widespread patching of critical vulnerabilities, the withholding strategy will look prescient. If the vulnerabilities remain largely unpatched while similar AI capabilities proliferate through other labs and open-source projects, the pause will have accomplished little beyond PR. In the meantime, the rest of the industry is watching. Anthropic has established a precedent: a major AI lab can trigger its own safety protocol and voluntarily restrict a frontier model. Whether that precedent holds, or whether it gets quietly walked back like so many voluntary commitments before it, will say a lot about whether AI safety frameworks are real constraints or just aspirational documents. The 10 trillion parameters exist. The vulnerabilities have been found. The only question now is whether the restraint buys us anything, or whether we're just watching the most sophisticated product launch in AI history.
References
- Anthropic, "Project Glasswing: Securing critical software for the AI era," April 7, 2026. https://www.anthropic.com/glasswing
- Anthropic, "Claude Mythos Preview System Card," April 7, 2026. https://www.anthropic.com/claude-mythos-preview-system-card
- Carlini, N. et al., "Assessing Claude Mythos Preview's cybersecurity capabilities," red.anthropic.com, April 7, 2026. https://red.anthropic.com/2026/mythos-preview/
- Anthropic, "Responsible Scaling Policy Version 3.0," February 24, 2026. https://www.anthropic.com/news/responsible-scaling-policy-v3
- Anthropic, "Activating AI Safety Level 3 protections," 2025. https://www.anthropic.com/news/activating-asl3-protections
- Fortune, "What Anthropic's too-dangerous-to-release AI model means for the AI race," April 10, 2026. https://fortune.com/2026/04/10/anthropic-too-dangerous-to-release-ai-model-means-for-its-upcoming-ipo/
- The New York Times, "Anthropic Claims Its New A.I. Model, Mythos, Is a Cybersecurity 'Reckoning'," April 7, 2026. https://www.nytimes.com/2026/04/07/technology/anthropic-claims-its-new-ai-model-mythos-is-a-cybersecurity-reckoning.html
- BBC, "What is Anthropic's Claude Mythos and what risks does it pose?" April 2026. https://www.bbc.com/news/articles/crk1py1jgzko
- PBS NewsHour, "Anthropic's powerful new AI model raises concerns about high-tech risks," April 2026. https://www.pbs.org/newshour/show/anthropics-powerful-new-ai-model-raises-concerns-about-high-tech-risks
- The Decoder, "From GPT-2 to Claude Mythos: The return of AI models deemed 'too dangerous to release'," April 8, 2026. https://the-decoder.com/from-gpt-2-to-claude-mythos-the-return-of-ai-models-deemed-too-dangerous-to-release/
- OpenAI, "Introducing GPT-5.4," March 5, 2026. https://openai.com/index/introducing-gpt-5-4/
- CNBC, "Anthropic releases Claude Opus 4.7, a less risky model than Mythos," April 16, 2026. https://www.cnbc.com/2026/04/16/anthropic-claude-opus-4-7-model-mythos.html
- Slate, "OpenAI says its text-generating algorithm GPT-2 is too dangerous to release," February 2019. https://slate.com/technology/2019/02/openai-gpt2-text-generating-algorithm-ai-dangerous.html
- Science, "NIH lifts 3-year ban on funding risky virus studies," December 2017. https://www.science.org/content/article/nih-lifts-3-year-ban-funding-risky-virus-studies
- Anthropic, "Three Sketches of ASL-4 Safety Case Components," 2024. https://alignment.anthropic.com/2024/safety-cases/
- The Free Press, "How Dangerous Is Anthropic's New AI Model? Its Chief Science Officer Explains," April 16, 2026. https://www.thefp.com/p/how-dangerous-is-anthropics-new-ai
- Times of India, "Explained: Why Anthropic's Claude Mythos is scaring the company," April 2026. https://timesofindia.indiatimes.com/technology/tech-news/explained-why-anthropics-claude-mythos-is-scaring-the-company-so-much-that-it-has-decided-to-not-release-it-to-public/articleshow/130159775.cms
You might also enjoy