Every agent needs a kill switch
We're in the middle of a strange moment in AI. According to Deloitte's 2026 State of AI in the Enterprise report, 75% of companies plan to invest in agentic AI, but only 11% have agents actually running in production. That gap should tell you something. The companies that have crossed the line from prototype to production are handing autonomous software real access to real systems, API keys, databases, email accounts, payment processors, and more. Meanwhile, the security conversation hasn't kept up. Most of these agents operate with more permissions than the average employee and fewer guardrails than an intern on their first day.
The permissions problem nobody talks about
Here's the uncomfortable truth about how AI agents work in practice: they inherit the identity and permissions of the user who launches them. As BeyondTrust's research on agent identity governance points out, the operating system doesn't distinguish between commands issued by a human and those generated by an AI tool. If a user has the privilege to perform an action, the agent inherits that privilege. This means that every over-permissioned user account in your organization is now a potential launchpad for an autonomous system that can act at machine speed. Varonis found that 88% of organizations harbor ghost users and tens of thousands of stale permissions. Pair that with AI agents and you've got a blast radius most security teams haven't even begun to map. The principle of least privilege, one of the oldest ideas in information security, is almost universally ignored when it comes to agents. Most teams spin up an agent, hand it a broad API key, and move on. The agent doesn't need access to every table in the database. It doesn't need the ability to send emails to external addresses. It doesn't need write access to production infrastructure. But it gets all of that anyway because scoping permissions properly takes time and nobody wants to slow down the demo.
The attack surface is enormous
OWASP published their Top 10 for Agentic Applications in late 2025, and the list reads like a catalog of the things most teams haven't thought about yet. Agent goal hijacking, where an attacker redirects the agent's objectives. Tool misuse, where agents chain API calls in ways nobody anticipated. Identity and privilege abuse. Supply chain vulnerabilities from compromised plugins or frameworks. Cascading failures that amplify across multi-agent systems. Prompt injection remains the most discussed vector, but it's far from the only one. Palo Alto Networks' Unit 42 research documents several additional threat categories: credential theft that lets attackers impersonate agents, code execution attacks that exploit an agent's ability to run code, and communication poisoning where attackers inject malicious data into agent-to-agent channels. The problem compounds in multi-agent architectures. McKinsey's research on agentic AI security describes how a flaw in one agent cascades across tasks to other agents, amplifying the risk. Their example is telling: a credit data processing agent misclassifies short-term debt as income due to a logic error, and the incorrect output flows downstream to the credit scoring and loan approval agents, resulting in an unjustified approval. No single agent "failed" in the traditional sense. The system failed because nobody designed for error propagation.
Real failures, not hypothetical ones
These aren't theoretical concerns. In March 2026, a Meta software engineer used an internal AI agent to analyze a question on an internal forum. Without explicit approval, the agent autonomously posted its response directly to the forum. A second employee acted on the agent's advice, triggering a chain of events that left internal systems storing sensitive company and user data accessible to unauthorized engineers for nearly two hours. On the infrastructure side, a widely discussed incident involved two AI agents stuck in a recursive loop that nearly cost $47,000 before it was caught. Another team reported a compliance agent that tried to update a locked contract, failed, retried, and repeated the cycle 3,000 times before anyone noticed. A developer on Reddit shared how their AI agent, given a long story to process, got "excited" and returned a plan for 22 scenes instead of the requested 3, triggering 22 separate tasks. When some took too long, the cloud provider's self-healing feature restarted them, and without checkpointing, the newly spawned workers started the cycle all over again. That one cost over $700. A Replit AI agent reportedly deleted an entire production database, followed by nine days of erratic behavior. None of these incidents involved sophisticated attacks. They were ordinary failures of systems that lacked basic constraints.
The visibility gap
Before you can secure something, you need to know it exists. A 2026 Gravitee survey found that only 24.4% of organizations have full visibility into which AI agents are communicating with each other. More than half of all agents run without any security oversight or logging. This is shadow IT all over again, except the shadow systems are autonomous and can take actions at machine speed. As AGAT Software's analysis of enterprise AI agent security notes, many agents operating inside enterprise environments were deployed by individual product and engineering teams without going through security review. They connect to tools, MCP servers, and external APIs that the security team has never mapped, scoped, or approved. You cannot govern what you cannot see.
Every agent needs these five things
The good news is that the solutions aren't exotic. They're borrowed from decades of security engineering and operations practice. Every agent you deploy to production should have these five things at minimum.
Hard spending and action limits
Set explicit budgets for API calls, token usage, and execution time. A compliance agent should not be able to retry an action 3,000 times. A content generation agent should not be able to spin up 22 parallel tasks. These limits should be hard caps enforced at the infrastructure level, not soft guidelines that depend on the agent's own judgment.
Human approval gates for irreversible actions
Any action that can't be easily undone, deleting data, sending external communications, processing payments, modifying access controls, should require human confirmation. This doesn't mean a human needs to approve every API call. It means you need to classify your agent's available actions by reversibility and put gates on the ones that matter.
A kill switch
Every agent needs a defined set of conditions under which it halts, escalates, or rolls back instead of trying to be clever. Not error handling. Not retries. A real kill switch. As one practitioner put it on a discussion about production agent strategies: the agents that work aren't the ones that handle every edge case, they're the ones that know when to stop.
Audit logs
Log every decision, every tool call, every input, and every output. When something goes wrong (and it will), you need to reconstruct exactly what happened and why. As teams who've deployed agents to production have observed, once you log decisions, inputs, retries, and outcomes, reviews become factual instead of speculative, and workflows steadily improve because failures are visible and repeatable.
Scoped permissions
Apply the principle of least privilege rigorously. Each agent should have access to only the specific tools, data sources, and actions it needs for its defined task. Nothing more. This is where the "one agent, one job" philosophy pays off. An agent with a narrow scope has a smaller blast radius when things go wrong. A general-purpose agent with broad permissions is a liability.
The model makers are drawing lines
It's worth noting that even the companies building the foundation models are taking security boundaries seriously. Anthropic published version 3.0 of their Responsible Scaling Policy in February 2026, a comprehensive framework for managing catastrophic risks from advanced AI systems. The policy establishes AI Safety Levels with increasingly strict security requirements as model capabilities advance. Their own research on measuring agent autonomy acknowledges that while most agent actions today are low-risk and reversible, the frontier of risk and autonomy will expand. But there's a gap between what model makers recommend and what deployers actually do. The frameworks prioritize capability demos over security defaults. Most agent development tutorials skip permission scoping entirely. The default path of least resistance is to give the agent everything it needs (and a lot of what it doesn't) just to get the demo working.
The boring work that matters
None of this is glamorous. Kill switches, audit logs, permission scoping, spending limits, these are the boring parts of building software. They don't make for exciting launch videos or impressive demos. But the 11% of organizations that have agents in production are learning the same lesson that every previous generation of distributed systems engineers learned: the hard part isn't getting the system to work, it's getting it to fail safely. IDC projects 1.3 billion AI agents by 2028. If even a small fraction of those agents operate with the kind of unchecked access that's common today, we're looking at a security surface that dwarfs anything we've dealt with before. The agents are coming whether we're ready or not. The question isn't whether your agent is smart enough. It's whether you've built the walls, the switches, and the logs that let you trust it.
References
You might also enjoy