The widening of AI safety
Not long ago, "AI safety" meant one thing: alignment. The field was defined by a core question, how do we make sure an advanced AI system does what we actually want it to do? It was a technical problem, studied mostly by a small community of researchers worried about superintelligence and loss of control. That narrow framing no longer holds. Over the past two years, AI safety has expanded into something much broader, absorbing concerns about misuse, cybersecurity, biosecurity, disinformation, societal resilience, and governance. The term now functions less as a label for a single research agenda and more as an umbrella over a sprawling, interdisciplinary landscape of risks. Understanding how and why this happened matters, because it shapes where resources go, what policies get written, and which problems get prioritized.
From alignment to everything else
The original AI safety community was rooted in a specific worry: that a sufficiently capable AI might pursue goals misaligned with human values, leading to catastrophic or existential outcomes. This is the "control problem," and it generated foundational work on reward modeling, interpretability, and value learning. But as AI systems moved from research labs into products used by billions, the threat landscape shifted. The risks were no longer hypothetical. They were concrete, measurable, and already happening. Deepfakes were being used to manipulate elections. Language models were helping craft more convincing phishing attacks. AI-generated content was flooding social media with misinformation at scale. The field had to widen because the technology had widened first.
The three pillars of modern AI safety
The International AI Safety Report 2026, led by Yoshua Bengio and over 100 contributing experts from more than 30 countries, provides one of the clearest maps of this expanded territory. The report organizes AI risks into three broad categories that capture the new scope.
Malicious use and misuse
Frontier AI models have shown improved potential to facilitate threats in areas like cybersecurity, biological weapons, and chemical and radiological hazards. The concern is not that AI autonomously decides to cause harm, but that it lowers the barrier for human actors who want to. A person with limited technical knowledge can now use AI tools to generate exploit code, synthesize dangerous biological sequences, or produce targeted disinformation campaigns. Organizations like the Center for AI Safety (CAIS), the National Academies, and the Nuclear Threat Initiative have documented how AI capabilities at the intersection of biology and computation could, within the next few years, enable the design of pathogens more dangerous than what exists in nature. This is not alignment failure. It is capability diffusion.
Systemic and societal risks
Beyond individual misuse, there are risks that emerge from the widespread deployment of AI across society. The World Economic Forum has highlighted how advanced AI and synthetic media are driving a systemic crisis of disinformation that risks destabilizing democracies. Algorithmic recommendation engines create echo chambers. AI-generated content erodes the shared epistemic foundation that societies need to function. These are not problems any single AI system creates. They are emergent properties of an ecosystem, millions of AI systems interacting with billions of people across interconnected platforms.
Loss of control and autonomy risks
The original alignment concerns have not disappeared. They have evolved. Google DeepMind's updated Frontier Safety Framework now includes explicit provisions for "deceptive alignment," the risk that AI systems purposefully undermine human control. Anthropic's Alignment Science team continues to publish open research directions focused on ensuring that increasingly capable systems remain reliably steerable. As AI systems become more agentic, capable of taking multi-step actions in the real world with minimal human oversight, the control problem becomes less theoretical and more operational.
Why the boundaries blurred
Several forces drove the convergence of what were once separate conversations. Capability jumps made risks tangible. When models could barely write coherent paragraphs, biosecurity risks felt abstract. Once they could reason through complex scientific literature and generate novel molecular structures, the threat became concrete enough to demand attention from biosecurity experts, not just AI researchers. Policy pressure created a common frame. The 2023 AI Safety Summit at Bletchley Park was a turning point. Governments needed a single term to organize their response to AI risks, and "safety" became the umbrella. The subsequent creation of AI Safety Institutes in multiple countries institutionalized this broader definition. The 2025 AI Safety Index from the Future of Life Institute now evaluates companies across 33 indicators spanning six domains, far beyond what any narrow definition of alignment would cover. The distinction between safety, security, and ethics started collapsing. A 2025 paper from researchers at multiple institutions, "AI Safety vs. AI Security: Demystifying the Distinction and Boundaries," argued that the two fields are fundamentally interdependent. Security lapses can cause safety failures, as when adversarial attacks bypass a model's safety guardrails. Unaddressed safety vulnerabilities become attack vectors. And ethical concerns about bias, fairness, and manipulation are increasingly treated as safety issues when they occur at scale.
What the widening means in practice
The expansion of AI safety has real consequences for how the field operates. Research agendas are broader. Anthropic's recommended research directions for 2025 span interpretability, evaluations, societal impacts, and governance, areas that would have been considered outside "safety" a few years ago. The Stanford AI Index Report documents how the frontier is becoming increasingly competitive and crowded, which means safety research must keep pace with a wider range of systems and deployment contexts. Governance frameworks are catching up. The ITU's AI Governance Report 2025 notes that risk assessment has become a central focus, with growing efforts to institutionalize evaluation practices and build shared safety infrastructure. But approaches to risk classification remain uneven, and few tools address real-world incidents at scale. The EU AI Act, national AI strategies, and sector-specific regulations are all attempting to operationalize a concept of safety that is still being defined. Talent and funding are spreading thin. As the Center for AI Safety has noted, relatively few people work on AI safety compared to the scale of the challenge. When safety meant alignment, the community was small but focused. Now that it encompasses everything from red-teaming for bioweapons capability to auditing recommendation algorithms for democratic harms, the demand for expertise far outstrips supply.
The tension at the center
There is a genuine tension in the widening of AI safety. On one hand, the broader framing is more honest. The risks from AI are diverse, interconnected, and cannot be reduced to a single technical problem. Treating safety as a narrow discipline while ignoring misuse, systemic effects, and governance failures would be dangerously incomplete. On the other hand, when a term means everything, it risks meaning nothing. If "AI safety" covers alignment research, cybersecurity, content moderation policy, international arms control, and labor market disruption, it becomes difficult to prioritize. Resources get diffused. Accountability gets unclear. The community that once had a sharp shared agenda now risks fragmenting into subgroups that share a label but not a methodology. The most productive path forward probably involves keeping the umbrella while being precise about what sits underneath it. The International AI Safety Report does this well, categorizing risks into distinct classes while acknowledging their interdependence. The field needs more of this kind of structured thinking, frameworks that are broad enough to capture the real risk landscape but specific enough to guide action.
Where this is heading
The widening of AI safety is not going to reverse. As AI capabilities continue to advance and deployment continues to accelerate, the surface area of risk will only grow. Agentic AI systems that can browse the web, write and execute code, and interact with external services introduce entirely new categories of failure modes that blend safety, security, and misuse concerns. What will matter is whether the expanded field can maintain intellectual coherence while addressing genuinely different kinds of problems. The alignment researcher and the biosecurity analyst and the platform policy expert all have essential perspectives. The challenge is building institutions, frameworks, and research programs that let them work together without losing the specificity that makes each contribution valuable. AI safety widened because it had to. The question now is whether it can hold together.
References
- International AI Safety Report 2026, led by Yoshua Bengio et al. https://internationalaisafetyreport.org/publication/international-ai-safety-report-2026
- Future of Life Institute, "2025 AI Safety Index." https://futureoflife.org/ai-safety-index-summer-2025/
- Americans for Responsible Innovation, "AI Safety Research Highlights of 2025." https://ari.us/policy-bytes/ai-safety-research-highlights-of-2025/
- Center for AI Safety, "AI Risks that Could Lead to Catastrophe." https://safe.ai/ai-risk
- Anthropic, "Recommendations for Technical AI Safety Research Directions." https://alignment.anthropic.com/2025/recommended-directions/
- Google DeepMind, "Frontier Safety Framework 2.0." https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/updating-the-frontier-safety-framework/Frontier%20Safety%20Framework%202.0.pdf
- National Academies of Sciences, Engineering, and Medicine, "AI Tools Can Enhance U.S. Biosecurity; Monitoring and Mitigation Will Be Needed to Protect Against Misuse." https://www.nationalacademies.org/news/ai-tools-can-enhance-u-s-biosecurity-monitoring-and-mitigation-will-be-needed-to-protect-against-misuse
- World Economic Forum, "How Cognitive Manipulation and AI Will Shape Disinformation in 2026." https://www.weforum.org/stories/2026/03/how-cognitive-manipulation-and-ai-will-shape-disinformation-in-2026/
- Stanford HAI, "The 2025 AI Index Report." https://hai.stanford.edu/ai-index/2025-ai-index-report
- ITU, "The Annual AI Governance Report 2025: Steering the Future of AI." https://www.itu.int/epublications/publication/the-annual-ai-governance-report-2025-steering-the-future-of-ai
- "AI Safety vs. AI Security: Demystifying the Distinction and Boundaries." https://arxiv.org/html/2506.18932v1