OpenAI Codex stopped being a coding tool. Here's what it actually does now

In April 2025, OpenAI launched Codex with a narrow pitch: a coding-specialized model that outperformed general-purpose models on programming tasks. Fourteen months and at least six major updates later, that description no longer covers what the product actually is.

Codex can now open desktop applications, click and type, browse real web pages natively, generate images, remember your preferences between sessions, connect to tools like Sentry and Datadog, and respond automatically to GitHub events without anyone triggering it manually. OpenAI's own framing captures the shift precisely: Codex evolved from an agent that writes code into one that uses code to get things done on your computer.

The basic shape of the product

Codex runs across four interconnected surfaces: a desktop app (Mac and Windows), a code editor extension, a command-line interface, and a cloud version. All four share session history, settings and context — start a task from the CLI and pick up the review in the desktop app later.

The functional difference from regular ChatGPT is this: ChatGPT answers questions and generates content within a conversation. Codex executes complete tasks autonomously inside an isolated sandbox, with real access to your system, your files, and — as of April 2026 — your entire desktop environment.

OpenAI Codex stopped being a coding tool. Here's what it actually does now

PHOTO: illustrative image generated with AI for informational purposes.

How people actually use it

The default workflow with Codex is delegation, not pair-programming. Instead of writing code alongside the model line by line, users describe a complete task — "fix the TypeScript error in the onboarding validation flow," "migrate the legacy auth middleware to our new session system," "build a racing game with eight maps and pickups triggered by spacebar" — and Codex executes it end-to-end in an isolated environment, without touching local code until the result is ready for review.

While one agent works on a task, users can keep working on something else or launch several agents in parallel across different projects. The desktop app organizes each agent into separate threads with worktree support, so multiple agents can operate on the same repository without stepping on each other.

One developer reviewing daily use at WorkOS described a typical morning: queue four or five Codex tasks before checking messages — fix a validation bug, update a webhook schema, add error boundaries to an admin dashboard, migrate an auth middleware — and by the time coffee is done, two or three pull requests are sitting ready for review. Tasks that used to consume 30-40% of a morning now run unattended in the background.

Computer Use: the update that changed the category

April 17, 2026 was the inflection point. OpenAI integrated Computer Use into Codex — full desktop application control, not just coding environments. A command like "open Figma and update the button colors on the pricing page while I keep working" results in Codex literally controlling mouse, keyboard and screen to complete the task in the background, without interrupting whatever the user is doing simultaneously.

Codex also gained a native browser for giving precise instructions on real web pages — select a visual element, describe the desired change. For visual generation, it runs the gpt-image-1.5 model, letting users combine screenshots and code to produce mockups, interfaces or product concepts inside the same workflow.

Memory lets Codex learn from a user's previous actions and retain preferences across sessions, removing the need to re-explain project context every time a new conversation starts.

These computer-control, personalization and memory features rolled out first on macOS for users signed in with a ChatGPT account. Availability in the EU and UK was delayed for data-use regulatory compliance.

The models powering it

GPT-5.4 became Codex's primary model on March 5, 2026 — OpenAI's first general-purpose model with Computer Use built in natively from the ground up. Twelve days later, GPT-5.4 mini arrived: a lighter variant using only 30% of GPT-5.4's compute quota, designed for fast, cheap subtasks while GPT-5.4 handles planning and final judgment on complex work.

The previous model, GPT-5.3-Codex, launched in February 2026, had already reached 77.3% autonomy on Terminal-Bench 2.0 — the benchmark measuring an agent's ability to complete complex terminal tasks without human intervention — while executing full development workflows 25% faster than its predecessor.

Plugins, Triggers, and the shift to "teammate"

March's update introduced what OpenAI calls its most competitive features of the year. Plugins connect Codex to development tools like Sentry and Datadog, giving it real context on production errors and metrics. Triggers let Codex respond automatically to GitHub events — a new issue opens, Codex responds within seconds, with no developer manually kicking it off.

OpenAI's own description of Triggers is blunt: "an engineering teammate that doesn't sleep, doesn't ask for vacation, and doesn't argue about tabs versus spaces." The technical difference from earlier automation is that Triggers are event-driven rather than polling-based — they react instantly instead of checking periodically for new work.

The most recent shift: Codex for non-developers

On June 2, 2026 — just days ago — OpenAI shipped an update that confirmed an explicit repositioning. Codex added role-specific plugins for non-technical profiles: analysts, creative teams, sales, product design, equity research and investment banking. It also introduced "Sites" — hosted interactive pages that turn an agent's output into something shareable and reviewable without opening any technical tool — and annotations, which let users edit specific parts of a result without redoing the entire task from scratch.

OpenAI stated openly that most Codex users today are no longer developers. That's the company confirming, in its own words, where the product is actually headed: from a developer niche tool to general office-work automation infrastructure.

How autonomous is it, really

The clearest number available is Terminal-Bench 2.0's 77.3% autonomy score on complex terminal tasks. In practice, that means Codex reaches the goal unassisted in the large majority of cases for code refactoring, test automation and repetitive DevOps work.

For software architecture decisions or code with highly specific business dependencies, human judgment is still required. And reviewing output before it ships to production remains, according to OpenAI's own documentation and advanced users' reports, a non-negotiable step — Codex executes with growing autonomy, but the final call on what reaches the real product stays human.

Where this is actually headed

The trajectory of the last fourteen months points toward larger effective context windows, richer tool use — running tests, static analysis, package audits — and deeper integration with version control and production environments.

OpenAI also previewed in March 2026 its plan to unify ChatGPT, Codex and the Atlas browser into a single computer platform — a move aimed squarely at competing with Anthropic's product ecosystem. If that unification ships, Codex stops being a standalone product and becomes the task-execution layer inside a much broader AI superapp.

What distinguishes Codex from other coding assistants today isn't just code generation — it's the combination of sustained autonomy, cross-session memory, real desktop application control, and native integration with the tools development teams already use daily. That combination is what turned Codex, in just over a year, into something considerably larger than its name suggests.