
Anthropic's new flagship Claude Opus 4.8 aced a math problem and shipped a spotless game, then drained the entire token quota in a single prompt. The model performed well on some tests but failed on others, showing inconsistent capability across six evaluations. Builders should expect strong results on focused tasks but risk exhausting tokens on complex prompts.
Tap to vote and see what everyone thinks.
Anthropic Urges Global Pause in AI Development, Flags 'Self-Improvement' Risk
Summary by ByteBrief