Anthropic's Claude Opus 4.6 brings 1M token context and 'agent teams' to take on OpenAI's Codex
Anthropic on Thursday released Claude Opus 4.6, a major upgrade to its flagship artificial intelligence model that the company says plans more carefully, sustains longer autonomous workflows, and outperforms competitors including OpenAI's GPT-5.2 on key enterprise benchmarks — a release that arrives at a tumultuous moment for the AI industry and global software markets.The launch comes just three days after OpenAI released its own Codex desktop application in a direct challenge to Anthropic's Claude Code momentum, and amid a $285 billion rout in software and services stocks that investors attribute partly to fears that Anthropic's AI tools could disrupt established enterprise software businesses.For the first time, Anthropic's Opus-class models will feature a 1 million token context window, allowing the AI to process and reason across vastly more information than previous versions. The company also introduced "agent teams" in Claude Code — a research preview feature that enables multip
Anthropic on Thursday released Claude Opus 4.6, a major upgrade to its flagship artificial intelligence model that the company says plans more carefully, sustains longer autonomous workflows, and outperforms competitors including OpenAI's GPT-5.2 on key enterprise benchmarks — a release that arrives at a tumultuous moment for the AI industry and global software markets.
The launch comes just three days after OpenAI released its own Codex desktop application in a direct challenge to Anthropic's Claude Code momentum, and amid a $285 billion rout in software and services stocks that investors attribute partly to fears that Anthropic's AI tools could disrupt established enterprise software businesses.
For the first time, Anthropic's Opus-class models will feature a 1 million token context window, allowing the AI to process and reason across vastly more information than previous versions. The company also introduced "agent teams" in Claude Code — a research preview feature that enables multiple AI agents to work simultaneously on different aspects of a coding project, coordinating autonomously.
"We're focused on building the most capable, reliable, and safe AI systems," an Anthropic spokesperson told VentureBeat about the announcements. "Opus 4.6 is even better at planning, helping solve the most complex coding tasks. And the new agent teams feature means users can split work across multiple agents — one on the frontend, one on the API, one on the migration — each owning its piece and coordinating directly with the others."
Why OpenAI and Anthropic are locked in an all-out war for enterprise developers
The release intensifies an already fierce competition between Anthropic and OpenAI, the two most valuable privately held AI companies in the world. OpenAI on Monday released a new desktop application for its Codex artificial intelligence coding system, a tool the company says transforms software development from a collaborative exercise with a single AI assistant into something more akin to managing a team of autonomous workers.
AI coding assistants have exploded in popularity over the last year, and OpenAI said more than 1 million developers have used Codex in the past month. The new Codex app is part of OpenAI's ongoing effort to lure users and market share away from rivals like Anthropic and Cursor.
The timing of Anthropic's release — just 72 hours after OpenAI's Codex launch — underscores the breakneck pace of competition in AI development tools. OpenAI faces intensifying competition from Anthropic, which posted the largest share increase of any frontier lab since May 2025, according to a recent Andreessen Horowitz survey. Forty-four percent of enterprises now use Anthropic in production, driven by rapid capability gains in software development since late 2024. The desktop launch is a strategic counter to Claude Code's momentum.
According to Anthropic's announcement, Opus 4.6 achieves the highest score on Terminal-Bench 2.0, an agentic coding evaluation, and leads all other frontier models on Humanity's Last Exam, a complex multi-discipline reasoning test. On GDPval-AA — a benchmark measuring performance on economically valuable knowledge work tasks in finance, legal and other domains — Opus 4.6 outperforms OpenAI's GPT-5.2 by approximately 144 ELO points, which translates to obtaining a higher score approximately 70% of the time.
Inside Claude Code's $1 billion revenue milestone and growing enterprise footprint
The stakes are substantial. Asked about Claude Code's financial performance, the Anthropic spokesperson noted that in November, the company announced that Claude Code reached $1 billion in run rate revenue only six months after becoming generally available in May 2025.
The spokesperson highlighted major enterprise deployments: "Claude Code is used by Uber across teams like software engineering, data science, finance, and trust and safety; wall-to-wall deployment across Salesforce's global engineering org; tens of thousands of devs at Accenture; and companies across industries like Spotify, Rakuten, Snowflake, Novo Nordisk, and Ramp."
That enterprise traction has translated into skyrocketing valuations. Earlier this month, Anthropic signed a term sheet for a $10 billion funding round at a $350 billion valuation. Bloomberg reported that Anthropic is simultaneously working on a tender offer that would allow employees to sell shares at that valuation, offering liquidity to staffers who have watched the company's worth multiply since its 2021 founding.
How Opus 4.6 solves the 'context rot' problem that has plagued AI models
One of Opus 4.6's most significant technical improvements addresses what the AI industry calls "context rot"—the degradation of model performance as conversations grow longer. Anthropic says Opus 4.6 scores 76% on MRCR v2, a needle-in-a-haystack benchmark testing a model's ability to retrieve information hidden in vast amounts of text, compared to just 18.5% for Sonnet 4.5.
"This is a qualitative shift in how much context a model can actually use while maintaining peak performance," the company said in its announcement.
The model also supports outputs of up to 128,000 tokens — enough to complete substantial coding tasks or documents without breaking them into multiple requests.
For developers, Anthropic is introducing several new API features alongside the model: adaptive thinking, which allows Claude to decide when deeper reasoning would be helpful rather than requiring a binary on-off choice; four effort levels (low, medium, high, max) to control intelligence, speed and cost tradeoffs; and context compaction, a beta feature that automatically summarizes older context to enable longer-running tasks.
Anthropic's delicate balancing act: Building powerful AI agents without losing control
Anthropic, which has built its brand around AI safety research, emphasized that Opus 4.6 maintains alignment with its predecessors despite its enhanced capabilities. On the company's automated behavior audit measuring misaligned behaviors such as deception, sycophancy, and cooperation with misuse, Opus 4.6 "showed a low rate" of problematic responses while also achieving "the lowest rate of over-refusals — where the model fails to answer benign queries — of any recent Claude model."
When asked how Anthropic thinks about safety guardrails as Claude becomes more agentic, particularly with multiple agents coordinating autonomously, the spokesperson pointed to the company's published framework: "Agents have tremendous potential for positive impacts in work but it's important that agents continue to be safe, reliable, and trustworthy. We outlined our framework for developing safe and trustworthy agents last year which shares core principles developers should consider when building agents."
The company said it has developed six new cybersecurity probes to detect potentially harmful uses of the model's enhanced capabilities, and is using Opus 4.6 to help find and patch vulnerabilities in open-source software as part of defensive cybersecurity efforts.
Sam Altman vs. Dario Amodei: The Super Bowl ad battle that exposed AI's deepest divisions
The rivalry between Anthropic and OpenAI has spilled into consumer marketing in dramatic fashion. Both companies will feature prominently during Sunday's Super Bowl. Anthropic is airing commercials that mock OpenAI's decision to begin testing advertisements in ChatGPT, with the tagline: "Ads are coming to AI. But not to Claude."
OpenAI CEO Sam Altman responded by calling the ads "funny" but "clearly dishonest," posting on X that his company would "obviously never run ads in the way Anthropic depicts them" and that "Anthropic wants to control what people do with AI" while serving "an expensive product to rich people."
The exchange highlights a fundamental strategic divergence: OpenAI has moved to monetize its massive free user base through advertising, while Anthropic has focused almost exclusively on enterprise sales and premium subscriptions.
The $285 billion stock selloff that revealed Wall Street's AI anxiety
The launch occurs against a backdrop of historic market volatility in software stocks. A new AI automation tool from Anthropic PBC sparked a $285 billion rout in stocks across the software, financial services and asset management sectors on Tuesday as investors raced to dump shares with even the slightest exposure. A Goldman Sachs basket of US software stocks sank 6%, its biggest one-day decline since April's tariff-fueled selloff.
The selloff was triggered by a new legal tool from Anthropic, which showed the AI industry's growing push into industries that can unlock lucrative enterprise revenue needed to fund massive investments in the technology. One trigger for Tuesday's selloff was Anthropic's launch of plug-ins for its Claude Cowork agent on Friday, enabling automated tasks across legal, sales, marketing and data analysis.
Thomson Reuters plunged 15.83% Tuesday, its biggest single-day drop on record; and Legalzoom.com sank 19.68%. European legal software providers including RELX, owner of LexisNexis, and Wolters Kluwer experienced their worst single-day performances in decades.
Not everyone agrees the selloff is warranted. Nvidia CEO Jensen Huang said on Tuesday that fears AI would replace software and related tools were "illogical" and "time will prove itself." Mark Murphy, head of U.S. enterprise software research at JPMorgan, said in a Reuters report it "feels like an illogical leap" to say a new plug-in from an LLM would "replace every layer of mission-critical enterprise software."
What Claude's new PowerPoint integration means for Microsoft's AI strategy
Among the more notable product announcements: Anthropic is releasing Claude in PowerPoint in research preview, allowing users to create presentations using the same AI capabilities that power Claude's document and spreadsheet work. The integration puts Claude directly inside a core Microsoft product — an unusual arrangement given Microsoft's 27% stake in OpenAI.
The Anthropic spokesperson framed the move pragmatically in an interview with VentureBeat: "Microsoft has an official add-in marketplace for Office products with multiple add-ins available to help people with slide creation and iteration. Any developer can build a plugin for Excel or PowerPoint. We're participating in that ecosystem to bring Claude into PowerPoint. This is about participating in the ecosystem and giving users the ability to work with the tools that they want, in the programs they want."
The data behind enterprise AI adoption: Who's winning and who's losing ground
Data from a16z's recent enterprise AI survey suggests both Anthropic and OpenAI face an increasingly competitive landscape. While OpenAI remains the most widely used AI provider in the enterprise, with approximately 77% of surveyed companies using it in production in January 2026, Anthropic's adoption is rising rapidly — from near-zero in March 2024 to approximately 40% using it in production by January 2026.
The survey data also shows that 75% of Anthropic's enterprise customers are using it in production, with 89% either testing or in production — figures that slightly exceed OpenAI's 46% in production and 73% testing or in production rates among its customer base.
Enterprise spending on AI continues to accelerate. Average enterprise LLM spend reached $7 million in 2025, up 180% from $2.5 million in 2024, with projections suggesting $11.6 million in 2026 — a 65% increase year-over-year.
Pricing, availability, and what developers need to know about Claude Opus 4.6
Opus 4.6 is available immediately on claude.ai, the Claude API, and major cloud platforms. Developers can access it via claude-opus-4-6 through the API. Pricing remains unchanged at $5 per million input tokens and $25 per million output tokens, with premium pricing of $10/$37.50 for prompts exceeding 200,000 tokens using the 1 million token context window.
For users who find Opus 4.6 "overthinking" simpler tasks — a characteristic Anthropic acknowledges can add cost and latency — the company recommends adjusting the effort parameter from its default high setting to medium.
The recommendation captures something essential about where the AI industry now stands. These models have grown so capable that their creators must now teach customers how to make them think less. Whether that represents a breakthrough or a warning sign depends entirely on which side of the disruption you're standing on — and whether you remembered to sell your software stocks before Tuesday.
Share
What's Your Reaction?
Like
0
Dislike
0
Love
0
Funny
0
Angry
0
Sad
0
Wow
0
