[ PLAYBOOK · 10 ] · MAY 14, 2026 · 7 min

Codex async for long-running migrations.

Codex's async cloud mode is the right tool for multi-day, well-scoped refactors. Claude Code is the right tool for hands-on local work. Cursor is the right tool inside the IDE. Pick by the shape of the work, not by vendor loyalty.


The take

Most teams pick a coding agent and use it for everything. That is wrong. The three serious options in 2026 each cover a different shape of work, and the cheapest way to spend less time fighting a tool is to match the tool to the task. Codex in its cloud-async mode is the right choice when the work is multi-day, well-scoped, and parallelizable across many similar files. Claude Code is the right choice when the work is interactive, codebase-wide, and benefits from a planning step before any file changes. Cursor is the right choice when the work is inline editing inside an open file. Treat them as three different tools, not three competing brands.

What "Codex async" actually means in 2026

Codex went through several shapes between its first release in April 2025 (Codex CLI) and its current form. The 2026 version is three surfaces that share a backend.

Codex CLI. The local terminal agent. Runs against your filesystem, in a sandbox you control, similar in shape to Claude Code. Synchronous; you sit with it.

Codex IDE extension. A panel inside VS Code (and JetBrains as of the Q1 2026 release) for editor-anchored work. Similar in shape to a Cursor or GitHub Copilot Workspace panel.

Codex cloud. The piece this post is about. You submit a task description and a repository. The task runs in an isolated cloud sandbox that clones your repo, installs dependencies, runs commands, edits files, opens a pull request, and waits for review. OpenAI's own engineering blog describes runs spanning hours; in practice we have seen migrations that run for over a day with the agent picking up after intermediate human review.

Underneath all three sits a model family that OpenAI calls Codex-tuned (GPT-5-Codex shipped in September 2025; the GPT-5.2-Codex update in December 2025 added long-horizon context compaction and stronger refactor performance, both of which the cloud surface leans on). The model is the same; the surface determines how much human is in the loop.

Where async cloud wins

Three properties of the work make Codex cloud the right pick. When all three are present, no local agent matches it on cost per outcome.

The work is well-scoped before you start. "Migrate every component from React class syntax to function components, preserving the test suite." "Replace every direct AWS SDK v2 call with the v3 equivalent." "Update every internal service from Node 18 to Node 22, fix the deprecated APIs, get CI green." These are migrations where the intent fits in one paragraph and the scope is mechanical. The agent does not need to ask design questions; it needs to apply the same transformation across many places and verify the tests still pass.

The work is parallelizable across files or modules. Codex cloud lets you fire off many tasks against the same repo in parallel. Each runs in its own sandbox. A migration that touches 200 files is usually faster as 10 tasks of 20 files each than as one task of 200 files, because the failure mode of a long autonomous run is that the agent gets lost in a debugging loop on one tricky file and burns hours that the other 199 do not need.

You do not need to watch it. The agent is running on OpenAI's infrastructure, not yours. It does not block your laptop, your terminal, or your attention. You queue a task before lunch, review the PR before the next meeting. The async cloud model trades responsiveness for throughput, and for a multi-day migration that is the right trade.

Where Claude Code wins instead

Claude Code's strength is the opposite shape of work.

Hands-on work that needs a plan before files change. Plan mode is the differentiating feature; it separates decide from do in a way that matters when the codebase is large enough that a wrong first edit cascades. The plan is reviewable before any file is touched, which is exactly what an exploratory refactor needs.

Cross-cutting refactors that need codebase-wide context. Claude Code reads files lazily but holds the full project structure in mind through the session, and it asks before touching files outside the area you pointed it at. The session shape rewards a human who can answer architecture questions in the moment rather than write them all into a one-shot spec.

Sensitive repositories that should not leave the laptop. The local sandbox keeps secrets on your machine, which matters when the repo touches private keys, customer data, or proprietary tooling that does not belong on a vendor's cloud. For regulated industries or pre-IPO codebases, that constraint alone decides the tool.

For an exploratory refactor where the right shape is not clear at the start, sitting next to Claude Code in plan mode beats firing off a Codex cloud task and hoping the description was specific enough. The cost of an ambiguous Codex task is wasted compute and a noisy PR. The cost of an ambiguous Claude Code session is a few minutes of clarification before files change.

Where Cursor wins instead

Cursor's strength is the smallest unit of work: editing a file you have open. Tab completion that knows the codebase, in-line edits driven by selection, the agent panel for a 50-line change. We see Cursor used at its best when the developer is the architect and the agent is the typist. It is the wrong tool for a multi-day migration; the loop of "describe, edit, accept, repeat" carries too much human attention to scale to 200 files. It is the right tool when the human is the one making decisions on every change.

The decision in one paragraph

Pick Codex cloud when the work is mechanical, multi-day, and parallelizable. Pick Claude Code when the work is exploratory, interactive, and benefits from a planning step. Pick Cursor when the work is inside one file at a time. A team running a serious migration will use two of the three on the same project: Codex cloud for the bulk mechanical work, Claude Code for the parts that need judgment, Cursor for the file-level cleanups at the end.

A migration playbook with Codex cloud

If you are about to run a multi-day migration and Codex cloud passes the three checks above, this is the sequence we run with clients.

Step 1. Scope on paper before any task runs. Write the migration as a one-paragraph spec the agent will receive. Name the transformation, the verification command (almost always your test suite or CI), the files that are out of scope, the conventions to preserve. We keep this in MIGRATION.md at the repo root and feed it to every task. A bad spec is the single largest predictor of a bad PR; OpenAI's own best-practices page is worth reading before the first run.

Step 2. Split the work into 20-to-50-file batches. A batch that small fits in one task's context budget; a batch that big is reviewable as a single PR. We label them by directory or by transformation pattern. Each batch becomes one Codex cloud task, fired in parallel.

Step 3. Pin the verification command. The task description should end with "run pnpm test (or your CI command) and only open the PR when it passes." This single sentence is what turns Codex from a code generator into a code shipper. Without it, the agent produces edits that look reasonable and fail in CI. With it, the agent retries until the suite is green or surfaces the failure for human review.

Step 4. Review PRs in small batches. Codex opens a PR per task. Review them in groups of three to five rather than one at a time. A migration's value shows up in the diff pattern, not in any individual file. Reviewing a cluster lets you spot the systemic mistake the agent is making and adjust the spec for the remaining batches before they run.

Step 5. Reserve the last 10% for a human. The tail of a migration is the part that breaks the pattern. Bespoke files that need judgment. Tests that have to be rewritten rather than auto-migrated. CI configuration that depends on the change. Plan to do that 10% with Claude Code or by hand, not with another Codex task. Forcing the agent through the tail costs more than just doing it yourself.

What to measure

Two numbers tell you whether Codex cloud is paying.

PR-to-merge ratio. Out of every 10 PRs Codex opens, how many merge with no human edits, how many merge with small edits, how many get rejected. Healthy runs sit at 6 to 8 clean merges, 1 to 3 small-edit merges, 0 to 1 rejections. If rejections are above 30%, the spec is too vague or the batch size is too big. Tune before firing the next round.

Wall-clock time per 100 files migrated. The whole point of async cloud is throughput. If a 200-file migration takes the same calendar time as a Claude Code session would, the tool is not paying. We see well-scoped migrations land at 1 to 3 days for the bulk and 1 to 2 days for the tail; if the bulk is taking longer than that, the spec or the batch size is wrong.

Coding agents in 2026 are not a one-tool choice. They are a portfolio. The teams that ship migrations on schedule are the ones that pick the right surface for each phase of the work and accept that the right tool for the bulk is not the right tool for the tail. Match the shape, not the brand.