Agentic problems

Everyone is talking about AI agents. They can book your flights, write your code, manage your inbox, run your business. At least, that's the pitch. The reality is messier. Agents in 2026 are powerful but deeply flawed, and the problems they introduce are just as interesting as the problems they solve. Here are the real challenges holding agents back right now, and why solving them is where the opportunity lies.

Security is the elephant in the room

The moment you give an agent access to your tools, data, and APIs, you are expanding your attack surface in ways most people do not fully appreciate. OWASP published both an AI Agent Security Cheat Sheet and a Top 10 for Agentic Applications for 2026, and the risk categories are sobering: prompt injection, tool abuse, data exfiltration, memory poisoning, goal hijacking, and cascading failures across multi-agent systems. Prompt injection remains the most persistent threat. Malicious instructions can be embedded in user input, retrieved documents, or even emails that an agent processes. An agent with access to your email and calendar can be tricked into leaking sensitive data through a carefully crafted message it reads. Direct injection is bad enough, but indirect injection, where the payload hides in external content the agent fetches, is harder to detect and defend against. Then there is the problem of what you give agents access to. API keys, PII, internal documents, financial data. Agents do not inherently understand what is sensitive. According to Microsoft's Cyber Pulse security report, more than 80% of Fortune 500 companies are deploying AI agents, but only 47% have security controls in place to manage them. A 2026 Gravitee survey found that only 24.4% of organizations have full visibility into which AI agents are communicating with each other, and more than half of all agents run without any security oversight or logging. The OWASP cheat sheet recommends treating all external data as untrusted, applying least-privilege access to every tool, and requiring human approval for high-impact actions. These are sound principles, but most agent builders today skip them entirely.

Context management is an unsolved problem

Models have finite attention. Even with 200K token context windows, performance degrades as you stuff more information in. This is sometimes called "context rot," where the model loses the ability to focus on what matters because it is drowning in noise. This plays out in a few ways. Long-running agents need some form of memory management, a way to persist relevant information across sessions without bloating the context window. Most teams do not have this. They either lose context between sessions or carry forward too much, degrading quality over time. Then there is the tool and skills problem. MCP (Model Context Protocol) and similar frameworks have made it easy to plug tools into agents. Too easy. Teams end up with 20 or 30 tools exposed to a single agent, many of which are rarely or never used. Research shows that agent performance degrades significantly beyond 5 to 10 tools. Every tool definition eats into the context budget, and overlapping tools create ambiguous decision points where the agent does not know which one to use. The fix is well understood in theory: just-in-time retrieval instead of pre-loading, specialized sub-agents with focused tool sets, and structured note-taking systems for long-term memory. But implementing these patterns is hard, and most teams have not gotten there yet.

Agents are still too complicated to set up

There is a significant gap between what agents can theoretically do and what most people can actually get them to do. Setting up an agent that works well requires understanding prompt engineering, tool configuration, API integrations, authentication flows, error handling, and orchestration patterns. That is a developer's job, not something a marketing manager or operations lead can do over lunch. According to KPMG, nearly two-thirds of leaders cite agentic system complexity as the top barrier to adoption, a finding that held steady for two consecutive quarters. Gartner projects that over 40% of agentic AI projects will be canceled before reaching production by 2027. The real gap in the market is not building more agents. It is making agents accessible. Knowing what agents can do, how to set them up, how to connect them to existing workflows, and how to replace manual processes with automated ones. That is where the value is. The people who can bridge the gap between agent capabilities and practical implementation are the ones who will capture the most opportunity. Think of it like the early days of the web: the money was not in building websites, it was in helping businesses understand why they needed one and how to use it.

No visibility into what agents are doing

When a human employee does something, you can ask them why. When an agent does something, you often have no idea what reasoning led to the action. Most agent frameworks provide minimal logging, no audit trails, and no way to replay or inspect decision chains. This is not just an inconvenience. It is a governance problem. If an agent sends an email on your behalf, approves a purchase order, or modifies a database record, you need to know why it did that and whether the reasoning was sound. Microsoft's Vasu Jakkal put it plainly: "Agent adoption and scaling is pretty significant, but at the same time, the visibility that organizations have on the agents is very limited." Without observability, you cannot debug failures, you cannot improve performance, and you certainly cannot satisfy compliance requirements. Shadow AI makes this worse. Many agents are deployed by individual teams without security review, connecting to tools and APIs that the security team has never mapped or approved.

Agents act as you, not as themselves

Here is a subtle but important problem: most agents today operate under your identity. When an agent sends a Slack message, it sends it as you. When it accesses a document, it uses your permissions. There is no concept of the agent as a separate identity with its own access controls, like an employee would have. This creates real risks. An agent with your credentials has access to everything you have access to, not just what it needs for its task. There is no kill switch, no way to revoke an agent's access without revoking your own. If the agent is compromised or makes a mistake, the blast radius is your entire digital footprint. CyberArk's 2026 security outlook highlights this exact concern: as organizations accelerate agent adoption, the identity and access management problem becomes critical. Agents need to be treated as distinct identities with scoped permissions, not extensions of a human user.

Models are expensive, but the future is local

Running agents on frontier models like GPT-4 or Claude is not cheap. API costs add up fast, especially for multi-step agents that make dozens of calls per task. Many organizations underestimate costs by 200 to 500%, according to industry analyses. A medium-complexity agent implementation can run anywhere from £1,800 to £10,500 per month in recurring operational costs. The counterpoint is local models. Open-source models like Llama, Phi, and DeepSeek have reached a level of quality that makes them viable for many agent tasks. They are not as capable as frontier models on complex reasoning, but for focused, well-scoped agent workflows, they often get the job done at a fraction of the cost. The important thing to remember about local models is that they are at their worst right now. Every month brings smaller, faster, more capable open models. The trajectory is clear: local inference will become good enough for most agent use cases, and the economics will be compelling. On-premise setups can deliver 30 to 50% savings over three years compared to cloud APIs at high utilization.

Nobody knows how to measure agent output

Traditional software has clear metrics. Uptime, latency, error rates. But how do you measure whether an agent did a good job? If an agent writes a report, how do you evaluate the quality of that report at scale? If it triages support tickets, how do you know it is making the right calls? The evaluation problem for autonomous systems is genuinely hard. Anthropic's guidance on agent evals describes multi-turn evaluations where you test entire execution traces, not just final outputs. Frameworks like DeepEval break evaluation into layers: the reasoning layer (did the agent think correctly?) and the action layer (did it use tools appropriately?). But these approaches are still emerging, and most teams are flying blind. This is not just a technical problem. It is a management problem. If you cannot measure what an agent is doing, you cannot justify the investment, you cannot improve it systematically, and you cannot hold it accountable. The organizations that figure out agent evaluation first will have a significant advantage.

Everyone is building the same thing

Scroll through any AI agent showcase and you will see the same patterns repeated endlessly. Customer support bots. Email assistants. Meeting summarizers. Code generators. Most "agents" are glorified chatbots, they respond to prompts rather than autonomously executing multi-step workflows. The real promise of agents is not answering questions. It is doing things: monitoring systems, making decisions, coordinating across tools, and completing end-to-end workflows without human intervention. But building truly autonomous agents requires a different mindset, what some are calling "agent engineering." It involves designing multi-agent architectures with specialized sub-agents, defining clear handoff protocols, implementing fallback strategies, and thinking carefully about what level of autonomy is appropriate for each task.

The hype is doing real damage

CEOs hear "AI agents can do everything" and set unrealistic expectations. Teams scramble to deploy agents for problems that do not need them, or worse, for problems that agents cannot reliably solve yet. This is the AI delusion at work: the gap between what demos show and what production systems deliver. The result is a cycle of hype, disappointment, and abandonment. Gartner's prediction that 40% of agentic AI projects will be canceled is not a commentary on the technology. It is a commentary on misaligned expectations. Agents are genuinely useful for well-scoped, repeatable tasks with clear success criteria. They are not a replacement for human judgment on novel, high-stakes decisions.

Solving these problems is where the money is

Every problem on this list represents a real business opportunity. Security tooling for agents, context management frameworks, no-code agent builders, observability platforms, identity management systems, evaluation infrastructure, agent consulting. These are not hypothetical markets. They are emerging right now, driven by the gap between agent adoption and agent maturity. The companies and individuals who will thrive are not the ones building yet another chatbot wrapper. They are the ones solving the hard infrastructure and operational problems that make agents reliable, secure, and accessible. The plumbing is not glamorous, but it is where the durable value lives.

References

OWASP, "AI Agent Security Cheat Sheet," https://cheatsheetseries.owasp.org/cheatsheets/AI_Agent_Security_Cheat_Sheet.html

OWASP, "Top 10 for Agentic Applications for 2026," https://genai.owasp.org/resource/owasp-top-10-for-agentic-applications-for-2026/

Yahoo Finance, "As AI agents take over, security is becoming a bigger concern," https://finance.yahoo.com/news/as-ai-agents-take-over-security-is-becoming-a-bigger-concern-160109297.html

AGAT Software, "AI Agent Security in 2026: What Enterprises Are Getting Wrong," https://agatsoftware.com/blog/ai-agent-security-enterprise-2026/

Inkeep, "Context Engineering: The Real Reason AI Agents Fail in Production," https://inkeep.com/blog/context-engineering-why-agents-fail

KPMG, "AI at Scale: How 2025 Set the Stage for Agent-Driven Enterprise Reinvention in 2026," https://kpmg.com/us/en/media/news/q4-ai-pulse.html

Galileo AI, "The Hidden Costs of Agentic AI: Why 40% of Projects Fail Before Production," https://galileo.ai/blog/hidden-cost-of-agentic-ai

CyberArk, "AI Agents and Identity Risks: How Security Will Shift in 2026," https://www.cyberark.com/resources/blog/ai-agents-and-identity-risks-how-security-will-shift-in-2026

Anthropic, "Demystifying Evals for AI Agents," https://www.anthropic.com/engineering/demystifying-evals-for-ai-agents

Confident AI, "AI Agent Evaluation Metrics," https://deepeval.com/guides/guides-ai-agent-evaluation-metrics