The state of cloud agents

Software development is being reshaped by a new category of tool: cloud coding agents. Unlike the autocomplete assistants and chat sidebars of 2024, these agents operate autonomously in cloud environments, cloning repositories, writing code across multiple files, running tests, and opening pull requests, all while you do something else entirely. The shift has been dramatic. Cursor reports that over 30% of their internally merged PRs are now created by cloud agents. OpenAI says Codex usage has surged among enterprise customers. Anthropic's Claude Code has become the go-to "escalation path" for developers tackling hard problems. And new entrants like Google's Jules and Cognition's Devin are pushing the boundaries of what autonomous coding looks like. This post takes a close look at the major cloud coding agents available in early 2026, compares their architectures and trade-offs, and offers a perspective on which might be the best fit depending on how you work.

What makes a cloud coding agent different

Traditional AI coding assistants live inside your editor. They suggest the next line, answer questions, and help you refactor. Cloud coding agents go further. They run in isolated environments (virtual machines or sandboxes), operate asynchronously in the background, and produce complete artifacts like pull requests, test results, and even video demos of their work. The key characteristics that define this category:

Autonomous execution in a cloud sandbox or VM, not on your laptop

Repository-level understanding, not just the file you have open

Asynchronous workflows where you assign a task and come back later to review

Parallel processing, running multiple agents on multiple tasks simultaneously

Self-validation, agents that can build, test, and iterate on their own output

The major cloud coding agents

GitHub Copilot coding agent

GitHub's coding agent is deeply integrated into the GitHub platform itself. You assign a GitHub issue to Copilot, and it autonomously writes code, creates a pull request, and responds to review feedback, all in the background. Each session runs on GitHub Actions infrastructure. How it works: Assign an issue to Copilot (or ask it in Copilot Chat to create a PR). The agent spins up a secure cloud environment, reads the repository, implements changes, runs security analysis via CodeQL, checks for secrets, and validates dependencies against the GitHub Advisory Database before completing the PR. Key strengths:

Native GitHub integration means zero setup friction

Built-in security protections (CodeQL, secret scanning, dependency checks)

Simple pricing: one premium request per session regardless of complexity

Works with third-party agents too (Claude, Codex) through the same GitHub interface

Limitations:

Less impressive on complex reasoning compared to Claude Code or Codex

Tied to the GitHub ecosystem

Power users often find it less flexible than CLI-based agents

Pricing: Available on all Copilot plans including Free (limited), Pro ($10/month), Pro+ ($39/month), Business ($19/user/month), and Enterprise ($39/user/month). The coding agent uses premium requests, with each session costing exactly one premium request.

OpenAI Codex

Codex is OpenAI's cloud-based software engineering agent, powered by codex-1 (a version of o3 optimized for coding). It was purpose-built for autonomous, parallel task execution in sandboxed environments. How it works: Through the Codex web interface or CLI, you assign tasks that run in isolated cloud sandboxes preloaded with your repository. Each task gets its own environment, and you can run many in parallel. Codex can read and edit code, run tests, install packages, and produce pull requests. Internet access is toggleable with per-domain allow lists. Key strengths:

Exceptional code correctness, consistently produces fewer bugs than competitors according to practitioner reports

Strong deterministic behavior on multi-step tasks

Secure sandboxing with toggleable internet access

Parallel task execution across multiple cloud environments

The new Codex desktop app serves as a "command center" for managing multiple agents

Limitations:

Slower than Claude Code due to less aggressive sub-agent delegation

Context window management and parallelism are still catching up

PR descriptions and commit messages tend to be terse

Pricing: Included with ChatGPT subscriptions: Plus ($20/month), Pro ($200/month), Business ($25/user/month), and Enterprise (custom). Currently available free for a limited time on ChatGPT Free and Go plans, with 2x rate limits on paid plans.

Claude Code

Claude Code takes a fundamentally different architectural approach from other cloud agents. Rather than moving your code to the cloud, it keeps everything local. Your code, environment, MCP servers, and project settings all stay on your machine. The recently launched Remote Control feature (February 2026) lets you start a session in your terminal and continue controlling it from your phone or another device through a secure encrypted bridge. How it works: Claude Code runs as a CLI tool in your terminal, directly in your project directory. It reads files, runs commands, edits code, manages git operations, and can orchestrate sub-agents across multiple context windows. The Opus model excels at splitting work across parallel sub-agents using tools like Explore (powered by Haiku for fast token processing) and Task calls. Key strengths:

Best-in-class context window management and sub-agent orchestration

Exceptional at planning, debugging, and architectural reasoning

Superior tool use (git, gh CLI, MCP servers, browser via /chrome)

"Creative" planning that catches things you might have forgotten

Human-readable PR descriptions and architecture diagrams

Remote Control lets you manage sessions from mobile without cloud handoff

Limitations:

Code correctness is slightly behind Codex according to some practitioners

Runs locally rather than in a cloud VM (a trade-off, not purely a limitation)

Cost can add up quickly on Max plans for heavy usage

The local-first model means you need your machine running (though Remote Control helps)

Pricing: Pro plan at $20/month (roughly 40-80 hours of usage per week), Max 5x at $100/month, Max 20x at $200/month. API access also available through Anthropic Console with pay-per-use billing.

Cursor cloud agents

Cursor made a major leap on February 24, 2026, when it launched upgraded cloud agents that give each agent its own virtual machine with a full development environment. These agents can build software, test their own changes, interact with desktop applications, record video demos of their work, and produce merge-ready pull requests. How it works: Cloud agents run in isolated VMs that replicate a full development environment. You can launch them from the Cursor desktop app, web, mobile, Slack, or GitHub. Each agent onboards itself to your codebase, implements changes, and can even use browser and desktop applications to verify its work. You can SSH into the VM or use port forwarding to test changes yourself. Key strengths:

Full VM isolation means agents can truly build, run, and test software end-to-end

Computer use capability lets agents interact with browsers, spreadsheets, and desktop apps

Video and screenshot artifacts for quick validation of agent work

Available from multiple surfaces (web, mobile, Slack, GitHub)

MCP server support for extensibility

Cursor's Bugbot provides automated code review on agent-generated PRs

Limitations:

Pricing tied to API-level model costs, which can be unpredictable for heavy usage

Still maturing on very large, complex refactors

Your code runs on Cursor's infrastructure (a consideration for some teams)

Pricing: Pro plan includes $20 of API agent usage, Pro Plus includes $70, Ultra includes $400. Cloud agents use Max Mode pricing. Additional usage is charged at model API rates.

Google Jules

Jules is Google's asynchronous coding agent, currently in public beta. It takes a "remote contractor" approach, running tasks in secure Google Cloud VMs and delivering results as GitHub pull requests. How it works: You describe a task through the Jules web interface or CLI tools. Jules clones your repository into a temporary VM, installs dependencies, runs build scripts, implements changes, and verifies them before creating a pull request with detailed diffs and reasoning. It also supports audio changelogs for team updates. Key strengths:

Fully asynchronous, works entirely in the background

Secure Google Cloud VM execution

GitHub integration with detailed PR diffs and reasoning

Multimodal output including audio changelogs

CLI tools (Jules Tools) make it scriptable and programmable

Limitations:

Still in public beta with regional availability restrictions

Less battle-tested than Claude Code or Codex in production workflows

Developer community feedback is still limited compared to more established agents

Pricing: Included with Google AI subscriptions. Google AI Ultra provides the highest limits for multi-agent workflows.

Devin

Devin, built by Cognition Labs, was one of the earliest entrants in the autonomous coding agent space and positions itself as a full "AI software engineer" rather than an assistant. It recently gained additional attention after Cognition acquired Windsurf. How it works: You assign Devin a task through Slack, Microsoft Teams, Linear, Jira, or its web interface. Devin plans the approach, shows you its proposal for review, implements changes in a cloud sandbox, tests them, and creates a pull request. It handles the entire lifecycle from ticket to tested code. Key strengths:

End-to-end task completion from issue to tested PR

Deep integrations with project management tools (Jira, Linear, Slack, Teams)

Devin Wiki and Devin Search provide codebase documentation and Q&A

Strong focus on enterprise and government use cases

Mobile coding support via natural language

Limitations:

Higher cost, especially for teams needing API access ($500/month for Teams plan)

ACU (Agent Compute Unit) consumption can be unpredictable

Some developers report it works better on well-defined tasks than ambiguous ones

Pricing: Core plan at $20/month for individuals, Teams at $500/month with API access, Enterprise with custom pricing.

Comparison at a glance

Agent	Execution environment	Primary interface	Best for	Starting price	Parallel agents	Code correctness	Context management	Speed	Self-testing	Security model
GitHub Copilot	GitHub Actions (cloud)	GitHub Issues, Chat	Teams already on GitHub	Free (limited)	Yes	Good	Good	Fast	CodeQL, dependency checks	Cloud (GitHub infrastructure)
OpenAI Codex	Cloud sandbox	Web, CLI, Desktop app	Code correctness, parallel tasks	$20/mo (ChatGPT Plus)	Yes	Excellent	Good (improving)	Slow	Runs tests in sandbox	Isolated sandbox, toggleable internet
Claude Code	Local machine + Remote Control	CLI (terminal)	Planning, debugging, complex reasoning	$20/mo (Pro)	Yes (sub-agents)	Very good	Excellent	Fast	Runs tests locally	Local-first (code stays on your machine)
Cursor Cloud	Isolated VMs	IDE, Web, Mobile, Slack	End-to-end feature building with verification	$20/mo (Pro)	Yes	Very good	Very good	Fast	Full VM with computer use	Isolated VMs
Google Jules	Google Cloud VMs	Web, CLI	Async tasks, Google ecosystem	Free (beta)	Yes	Good	Good	Moderate	Builds and verifies in VM	Google Cloud VMs
Devin	Cloud sandbox	Slack, Jira, Linear, Web	End-to-end autonomous engineering	$20/mo (Core)	Yes	Good	Good	Moderate	Full lifecycle testing	Cloud sandbox

The architectural divide: cloud-first vs. local-first

One of the most interesting tensions in this space is the architectural split between cloud-first and local-first approaches. Cloud-first agents (Cursor, Codex, Jules, Devin, GitHub Copilot) move the execution environment to the cloud. Each agent gets its own VM or sandbox. This enables true parallelism, eliminates local resource conflicts, and means your laptop does not need to stay connected. The trade-off is that your code runs on someone else's infrastructure. Local-first agents (Claude Code) keep everything on your machine. Code never leaves your environment. MCP servers, environment variables, and project settings all stay local. The cloud only routes messages between your devices. The trade-off is that you need your machine running, and true cloud-level parallelism requires worktrees and multiple terminal sessions. Neither approach is strictly superior. For teams with strict data residency requirements or sensitive codebases, local-first is compelling. For teams that want to fire off dozens of tasks and review results in the morning, cloud-first offers clear advantages.

How practitioners actually use these tools

One of the most revealing perspectives comes from developers who pay for and use multiple agents daily. A common pattern emerging in 2026 is using different agents for different phases of work:

Planning and architecture: Claude Code (Opus) excels at creating plans, explaining code structure, and catching things you might have forgotten. Its sub-agent orchestration feels fast and natural.

Writing code: Codex consistently produces fewer bugs. When correctness matters most, practitioners report reaching for Codex over Claude Code, despite it being slower.

Parallel background tasks: Cursor cloud agents and Codex are popular choices for kicking off multiple tasks before going to bed or stepping away.

Code review: Cursor's Bugbot and Codex's code review are increasingly trusted for catching subtle bugs that human reviewers miss.

Quick fixes and bug reproduction: Cursor cloud agents shine here, especially with their ability to record video demos and interact with the actual software.

The idea of picking one tool and using it exclusively is fading. As Calvin French-Owen (who helped launch the Codex web product) puts it: developer time is now the biggest consideration, and the choice of agent is increasingly a function of how much time you have and how long you want it to run autonomously.

Conclusion: which agent is best?

There is no single best cloud coding agent. The answer depends on your priorities:

If you want the most correct code: OpenAI Codex. Practitioners consistently report fewer bugs in Codex-generated code, though it comes at the cost of speed.

If you want the best reasoning and planning: Claude Code. Opus is unmatched at understanding complex codebases, orchestrating sub-agents, and producing thoughtful plans and explanations. Its local-first architecture is also ideal for privacy-sensitive work.

If you want end-to-end autonomy with verification: Cursor cloud agents. The ability to give each agent a full VM, let it build and test software, and produce video proof of its work is genuinely novel. This is the closest thing to a "self-driving codebase" available today.

If you want the least friction on GitHub: GitHub Copilot coding agent. It is already there, already approved by your company, and the one-premium-request-per-session pricing is hard to beat for predictability.

If you want full lifecycle automation from ticket to PR: Devin. Its deep integrations with Jira, Linear, and Slack make it the most "project-management-aware" agent.

If you are in the Google ecosystem: Jules is promising and rapidly improving with Gemini 3, though still in beta.

The most productive developers in 2026 are not choosing one agent. They are building workflows that leverage the strengths of several, switching between planning in Claude Code, coding in Codex, reviewing with Bugbot, and verifying with Cursor cloud agents. The tools are converging in capability, but their architectural choices and trade-offs still make each one better suited to different parts of the development lifecycle. The real question is no longer "which agent writes the best code?" It is "how do I orchestrate these agents to ship better software faster?" And that is a question about workflow design, not tool selection.

References

GitHub, "GitHub Copilot: Meet the new coding agent," https://github.blog/news-insights/product-news/github-copilot-meet-the-new-coding-agent/

OpenAI, "Introducing Codex," https://openai.com/index/introducing-codex/

Cursor, "Cursor agents can now control their own computers," https://cursor.com/blog/agent-computer-use

Cursor, "Cloud Agents," https://cursor.com/blog/cloud-agents

Anthropic, Claude Code Remote Control documentation, https://orbilontech.com/claude-code-remote-control-mobile-coding-2026/

DevOps.com, "Claude Code Remote Control Keeps Your Agent Local and Puts it in Your Pocket," https://devops.com/claude-code-remote-control-keeps-your-agent-local-and-puts-it-in-your-pocket/

Google, "Build with Jules, your asynchronous coding agent," https://blog.google/innovation-and-ai/models-and-research/google-labs/jules/

Google Developers Blog, "Building with Gemini 3 in Jules," https://developers.googleblog.com/jules-gemini-3/

Cognition Labs, "Introducing Devin," https://cognition.ai/blog/introducing-devin

Calvin French-Owen, "Coding Agents in Feb 2026," https://calv.info/agents-feb-2026

Faros AI, "Best AI Coding Agents for Developers in 2026," https://www.faros.ai/blog/best-ai-coding-agents-2026

CNBC, "Cursor announces major update as AI coding agent battle heats up," https://www.cnbc.com/2026/02/24/cursor-announces-major-update-as-ai-coding-agent-battle-heats-up.html

Fortune, "OpenAI reports Codex usage is surging," https://fortune.com/2026/03/04/openai-codex-growth-enterprise-ai-agents/

GitHub, "GitHub Copilot coding agent now uses one premium request per session," https://github.blog/changelog/2025-07-10-github-copilot-coding-agent-now-uses-one-premium-request-per-session/

Cursor, "The third era of AI software development," https://cursor.com/blog/third-era

Agent

Execution environment

Primary interface

Best for

Starting price

Parallel agents

Code correctness

Context management

Speed

Self-testing

Security model

GitHub Copilot

GitHub Actions (cloud)

GitHub Issues, Chat

Teams already on GitHub

Free (limited)

Yes

Good

Fast

CodeQL, dependency checks

Cloud (GitHub infrastructure)

OpenAI Codex

Cloud sandbox

Web, CLI, Desktop app

Code correctness, parallel tasks

$20/mo (ChatGPT Plus)

Yes

Excellent

Good (improving)

Slow

Runs tests in sandbox

Isolated sandbox, toggleable internet

Claude Code

Local machine + Remote Control

CLI (terminal)

Planning, debugging, complex reasoning

$20/mo (Pro)

Yes (sub-agents)

Very good

Excellent

Fast

Runs tests locally

Local-first (code stays on your machine)

Cursor Cloud

Isolated VMs

IDE, Web, Mobile, Slack

End-to-end feature building with verification

$20/mo (Pro)

Yes

Very good

Fast

Full VM with computer use

Isolated VMs

Google Jules

Google Cloud VMs

Web, CLI

Async tasks, Google ecosystem

Free (beta)

Yes

Good

Moderate

Builds and verifies in VM

Google Cloud VMs

Devin

Cloud sandbox

Slack, Jira, Linear, Web

End-to-end autonomous engineering

$20/mo (Core)

Yes

Good

Moderate

Full lifecycle testing

Cloud sandbox