AIVentureBeat2 days ago

GPT-5.5 tops new Agents' Last Exam benchmark

6 min read

OpenAI's GPT-5.5 from April, using the Codex harness, scored 24.0% on the new Agents' Last Exam benchmark, beating Anthropic's Mythos-class Claude Fable 5. The benchmark, created by UC Berkeley's RDI and over 300 experts, measures AI performance on long-horizon professional workflows.

Level

Hype check

Tap to vote and see what everyone thinks.

#openai #anthropic #ai-benchmarks

Read full story