Context windows are the new RAM
Foundation model improvements are slowing. The next unlock for agentic AI isn't smarter models, it's bigger, better memory. Context windows and persistent memory are to agents what RAM was to PCs: the bottleneck that, once lifted, changes what's possible.
The 90s called, and they want their bottleneck back
If you built or used PCs in the 1990s, you remember the constraint that defined an era: RAM. With 4 MB, you could run one program at a time. With 8 MB, you could maybe have a word processor and a browser open together. By the time machines shipped with 32 or 64 MB in the late 90s, everything changed. Bigger files, real multitasking, software that could actually think ahead. The CPU was fast enough. The hard drive was big enough. RAM was the thing holding everything back. Context windows are the RAM of agentic AI. A context window is the amount of text, measured in tokens, that a language model can "see" at once. It includes the prompt, the conversation history, any documents fed in, and the model's own output. Everything outside the window simply doesn't exist to the model. In 2019, the best models topped out at about 1,024 tokens. By 2024, we hit 1 million. That's a roughly 1,000x increase in six years, a pace that some have compared to Moore's Law for transistors. But raw window size is only part of the story. Research consistently shows that models perform worse as context grows. Hallucinations spike, retrieval accuracy drops, and critical information gets lost in the noise. One analysis found that most models advertising 200k token windows become unreliable around 130k, with sudden performance drops rather than gradual degradation. A large context window with poor utilization is like having 64 MB of RAM that your operating system can only address half of.
The real gap: agents that forget
Here's the problem most people building with AI agents have hit: agents are still mostly stateless across sessions. They forget. Every conversation starts from scratch. Every workflow re-reads the same context. This is the single biggest gap between demos and production. An agent that remembers your codebase across sessions versus one that re-reads everything each time, the difference is 10x in usefulness. The first can build on prior understanding, recall your preferences, and connect decisions from last week to the task at hand. The second is starting from zero every single time, burning through tokens just to get back to where it was. Developers working on this problem describe it in almost identical terms. "Memory persistence in AI agents is worse than I expected," is a common refrain. The harder question isn't how to store everything, it's how an agent decides what's worth remembering versus what's throwaway. That's the same tradeoff operating systems had to make with RAM and virtual memory decades ago: what stays in fast memory, what gets paged out, and what gets discarded entirely.
From bigger models to better memory
The shift in focus from "bigger models" to "better memory" is a maturity signal for the industry. We've hit diminishing returns on parameter scaling for practical tasks. GPT-4 class models handle most reasoning well enough. The bottleneck now is context, not intelligence. Three design philosophies have emerged for tackling agent memory: Short-term context management. This is the equivalent of CPU cache. It involves carefully curating what goes into the context window for each request, using techniques like just-in-time retrieval instead of dumping entire document libraries into every prompt. The principle is simple: give agents tools to fetch specific data on demand rather than pre-loading everything. Like humans who use search rather than memorizing encyclopedias, agents perform better when they navigate information dynamically. Persistent session memory. Think of this as RAM that survives a reboot. Session-based memory stores state across multiple interactions, so an agent can pick up where it left off. Modern frameworks like Google's Agent Development Kit now support session state natively, letting developers attach structured memory to ongoing conversations. This is where most production agent systems are headed first. Long-term learned memory. This is the hard drive, the deep storage that accumulates over weeks and months. Systems that implement this use vector databases, knowledge graphs, or compressed summaries to persist context across time. The agent doesn't remember raw conversations, it remembers distilled understanding. This is still early, but it's where the real compounding value lives.
Why MCP and tool use depend on memory
The Model Context Protocol (MCP) and similar tool-use patterns are rapidly becoming the standard way agents interact with external systems. But here's the catch: effective tool use depends entirely on context. An agent needs to know what tools are available, when to use them, what worked last time, and what the user's goals are. Without memory, every tool call is a shot in the dark. An agent with a 200k context window but no persistent memory is a goldfish with a library card. It can access vast information in the moment, but it can't build on anything. It can't learn that you prefer a specific API over another, that a certain database query pattern works better for your schema, or that the last three times it tried a particular approach it failed. Memory turns tool use from reactive to strategic. A memory-first architecture treats context as a first-class citizen, not an afterthought. Local-first, agent-native runtimes where memory persists and compounds are the next evolution.
The next wave won't compete on intelligence
Here's my prediction: the next generation of "10x agent" products will differentiate on memory, not model intelligence. The models are converging. GPT-4, Claude, Gemini, they're all good enough for most practical tasks. What separates a useful agent from a toy is whether it remembers. Think about it from the user's perspective. You don't switch from one competent colleague to another because one is slightly smarter. You switch because one remembers your project history, your preferences, your constraints. The colleague who has to be re-briefed every morning is exhausting, no matter how brilliant. The RAM analogy has limits, of course. RAM had hard physical constraints: cost per megabyte, address bus width, motherboard slots. Context windows have different scaling dynamics, governed by attention mechanisms, compute costs, and architectural choices rather than silicon physics. But the pattern is the same. The bottleneck isn't processing power. It's working memory. And the products that solve working memory first will win the next cycle. We're at the equivalent of the early 2000s in PC history, when RAM went from a luxury to a commodity and software exploded in capability as a result. For agents, that moment is just starting.
References
- Max Petrusenko, "The Illusion of Scale: Why Your LLM's Context Window Is Lying to You," Medium, January 2026. https://medium.com/@max.petrusenko/the-illusion-of-scale-why-your-llms-context-window-is-lying-to-you-454f08c31260
- Jeremy Burton, "LLM Context Window Size: The New Moore's Law?" Platform Studios. https://os.platformstud.io/guild/articles/llm-context-window-size-the-new-moore-s-law-by-jeremy-burton
- Spencer Torene, "Understanding the Impact of Increasing LLM Context Windows," Meibel, April 2025. https://www.meibel.ai/post/understanding-the-impact-of-increasing-llm-context-windows
- "Best LLMs for Extended Context Windows in 2026," AI Multiple. https://aimultiple.com/ai-context-window
- "Why larger LLM context windows are all the rage," IBM Research, July 2024. https://research.ibm.com/blog/larger-context-window
- "Memory for AI Agents: A New Paradigm of Context Engineering," The New Stack. https://thenewstack.io/memory-for-ai-agents-a-new-paradigm-of-context-engineering/
- "Context Engineering: The Real Reason AI Agents Fail in Production," Inkeep. https://inkeep.com/blog/context-engineering-why-agents-fail
- "Stateful Agents: The Missing Link in LLM Intelligence," Letta, February 2025. https://www.letta.com/blog/stateful-agents
- "AI Agent Memory: Building Stateful AI Systems," Redis, February 2026. https://redis.io/blog/ai-agent-memory-stateful-systems/
- "Context Window vs. Memory Architecture: The Next Frontier of LLM Design," Shieldbase AI. https://shieldbase.ai/blog/context-window-vs-memory-architecture-the-next-frontier-of-llm-design
You might also enjoy