BipBiz

collapse
Home / Daily News Analysis / Anthropic's Claude Opus 4.8 is four times more honest, Mythos next

Anthropic's Claude Opus 4.8 is four times more honest, Mythos next

May 29, 2026  Twila Rosenbaum  30 views
Anthropic's Claude Opus 4.8 is four times more honest, Mythos next

Key Facts

  • Anthropic released Claude Opus 4.8, its flagship AI model, with improved honesty and reliability.
  • The model is four times less likely to overlook code flaws compared to Opus 4.7.
  • Benchmark scores improved across agentic coding, reasoning, computer use, and knowledge work.
  • New features include effort control, dynamic workflows, and faster API modes.
  • Anthropic teased Mythos-class models, which have already found over 10,000 critical vulnerabilities via Project Glasswing.
  • The company announced a $65 billion Series H round at a $965 billion post-money valuation.
  • Expansion continues with new offices in Milan and Seoul.

Introduction

Anthropic has released Claude Opus 4.8, an upgrade to its flagship AI model that the company says is more honest, more reliable in agentic tasks, and better at catching its own mistakes. The model is available immediately at the same price as its predecessor, $5 per million input tokens and $25 per million output tokens, and is rolling out across all Anthropic products including claude.ai, Claude Code, and the API.

The headline improvement is honesty. Anthropic says Opus 4.8 is around four times less likely than Opus 4.7 to let flaws in code it has written pass unremarked. Early testers report the model is more willing to flag uncertainties about its work and less likely to make unsupported claims, a persistent problem across AI models that tend to project confidence regardless of whether it is warranted.

Benchmark Gains Across the Board

Opus 4.8 improves on its predecessor across Anthropic's published benchmarks. On agentic coding (Terminal-Bench 2.1), the score rises from 64.3% to 69.2%. Multidisciplinary reasoning with tools improves from 54.7% to 57.9%. Agentic computer use moves from 82.8% to 83.4%, and knowledge work scores rise from 1,753 to 1,890. These incremental gains reflect Anthropic's focus on practical reliability rather than raw performance.

Anthropic's alignment assessment found that Opus 4.8 reaches new highs on measures of prosocial traits, including supporting user autonomy and acting in the user's best interest. Rates of misaligned behaviour such as deception or cooperation with misuse are substantially lower than in Opus 4.7, and comparable to Claude Mythos Preview, Anthropic's best-aligned model. This emphasis on alignment aligns with the company's stated mission of building safe AI systems.

Early Testers See Practical Gains

The release is accompanied by endorsements from companies already using the model. Cognition, the company behind the AI coding agent Devin, said Opus 4.8 uses tools cleanly and fixes comment-verbosity and tool-calling issues that appeared in Opus 4.7. Cursor, the AI-powered code editor, reported improvements across every effort level on its CursorBench evaluation.

Harvey, which builds AI for legal work, said Opus 4.8 delivers the highest score recorded on its Legal Agent Benchmark and is the first model to break 10% overall on the all-pass standard. Databricks reported that Opus 4.8 handles deeper multistep questions faster in its Genie AI agent, at 61% cheaper token cost than Opus 4.7.

Thomson Reuters said CoCounsel Legal saw meaningful improvements in consistency and reasoning quality. Hebbia, which builds AI for financial document analysis, noted better citation precision and more token efficiency on retrieval tasks. These real-world validations suggest the model's improvements are not just theoretical but translate into tangible productivity gains for enterprise users.

New Features Alongside the Model

Anthropic is launching several features alongside Opus 4.8. A new effort control in claude.ai and Cowork lets users choose how much computation Claude applies to a response, trading speed against quality. This feature allows developers and knowledge workers to fine-tune the model's behavior for specific tasks, from quick answers to deep analysis.

Claude Code gains a dynamic workflows feature that allows it to plan work and run hundreds of parallel subagents in a single session, enabling codebase-scale migrations across hundreds of thousands of lines of code. This capability represents a significant step toward fully autonomous software engineering, where AI can handle complex refactoring tasks with minimal human oversight.

For developers, the Messages API now accepts system entries inside the messages array, allowing instructions to be updated mid-task without breaking the prompt cache. Fast mode for Opus 4.8, which runs at 2.5 times the speed, is now three times cheaper than it was for previous models. These improvements lower the cost and complexity of integrating Claude into production systems, further driving enterprise adoption.

Mythos Is the Bigger Story

The more significant announcement may be what comes next. Anthropic said it plans to release a new class of model with higher intelligence than Opus, based on the Claude Mythos architecture. A small number of organisations are already using Claude Mythos Preview through Project Glasswing, an initiative focused on using the model for cybersecurity work. Anthropic and roughly 50 partners, including Apple, Google, Microsoft, and Amazon Web Services, have used Mythos Preview to find more than 10,000 high- or critical-severity vulnerabilities across critical software infrastructure.

Mythos-class models require stronger cyber safeguards before general release, Anthropic said, but the company expects to bring them to all customers in the coming weeks. The model sits a full capability tier above Opus 4.7 and can autonomously find zero-day vulnerabilities and create exploits for them, which explains both the excitement and the caution around its deployment. The ability to autonomously discover and weaponize vulnerabilities raises significant safety concerns, which is why Anthropic is proceeding deliberately with additional safeguards.

Project Glasswing represents a new model of collaborative AI safety, where frontier models are tested in controlled environments with trusted partners before wider release. This approach mirrors Anthropic's earlier work on responsible scaling and reflects the growing recognition that advanced AI capabilities require commensurate security measures. The discovery of over 10,000 vulnerabilities highlights both the power of these models and the scale of the cybersecurity challenge they can address.

The development of Mythos also signals Anthropic's confidence in its scaling approach. After Opus 4.7, many observers wondered whether the company had hit a performance plateau. Mythos-class models suggest that significant headroom remains for improvement, particularly in areas requiring deep reasoning and autonomous action. This places Anthropic in direct competition with OpenAI's GPT-5 family and Google's Gemini Ultra, which have also shown rapid capability gains.

A Company Approaching $1 Trillion

The Opus 4.8 launch arrives as Anthropic's valuation continues to climb. The company announced a $65 billion Series H round at a $965 billion post-money valuation on the same day, up from the $380 billion valuation at which it closed its $30 billion Series G in February. Revenue has grown from roughly $1 billion at the end of 2024 to an estimated $30 billion annualised run rate in 2026, driven by enterprise adoption of Claude. This explosive growth reflects the rapid commoditization of AI infrastructure and the increasing willingness of businesses to bet on a single platform for their AI needs.

Anthropic also opened a new office in Milan on 28 May, its sixth in Europe, and appointed KiYoung Choi as Representative Director of Korea ahead of a Seoul office opening. The expansion reflects growing demand for Claude in enterprise markets outside the United States. Europe, in particular, has become a battleground for AI companies due to its stringent regulatory environment and large number of multinational corporations. Anthropic's investment in local offices signals its commitment to compliance and customer support in these markets.

The valuation increase is supported by strong revenue growth and a favorable market climate. Investors are betting that AI will become a foundational technology, much like the internet or mobile computing. Anthropic's focus on safety and reliability differentiates it from competitors, but the company still faces significant risks: competition from well-funded rivals, regulatory scrutiny, and the technical challenge of maintaining alignment as models become more capable. The $65 billion Series H round includes participation from existing investors like Google, Spark Capital, and new sovereign wealth funds, indicating sustained confidence in Anthropic's long-term prospects.

The Competitive Context

Opus 4.8 enters a market where the pace of model releases has accelerated sharply. OpenAI launched GPT-5.5 as its first fully retrained base model since GPT-4.5, and GPT-5.4 set new records on professional benchmarks earlier this year. Google has invested up to $40 billion in Anthropic but continues to develop its own Gemini models. The frontier AI market has consolidated into a three-way race between Anthropic, OpenAI, and Google, with each company releasing incremental model upgrades at an increasing pace.

For Anthropic, the distinction it is trying to draw with Opus 4.8 is not raw capability but reliability. A model that catches its own mistakes, flags its uncertainties, and follows instructions consistently is more useful in agentic workflows where AI systems operate with limited human oversight. Whether that positioning holds as Mythos-class models arrive, promising higher intelligence with new safety constraints, will determine whether Anthropic can maintain its lead in the enterprise market it has worked to dominate.

The competitive landscape also includes emerging players like Mistral AI, Cohere, and xAI, which are targeting specific niches. Mistral has gained traction in Europe with its open-weight models, while Cohere focuses on enterprise search and retrieval. xAI, founded by Elon Musk, is developing Grok with a focus on real-time data and less restricted responses. However, none have yet matched the frontier capabilities of the top three. The consolidation of talent and capital around a few players suggests that the AI industry is following the pattern of previous platform shifts, where a small number of companies capture most of the value.

Anthropic's emphasis on safety and alignment may prove to be a competitive advantage as governments around the world implement AI regulations. The European Union's AI Act, which classifies models by risk level, could create compliance burdens for less cautious competitors. Anthropic's proactive approach to alignment and its willingness to share safety research could position it as a trusted partner for regulated industries like healthcare, finance, and legal services. Opus 4.8's improved honesty and reduced misalignment directly address the concerns of these sectors, making it an attractive option for enterprises that cannot afford AI hallucinations or deceptive behavior.


Source: TNW | Anthropic News


Share:

Your experience on this site will be improved by allowing cookies Cookie Policy