
OpenAI's GPT-5.5 from April, using the Codex harness, scored 24.0% on the new Agents' Last Exam benchmark, beating Anthropic's Mythos-class Claude Fable 5. The benchmark, created by UC Berkeley's RDI and over 300 experts, measures AI performance on long-horizon professional workflows.
Tap to vote and see what everyone thinks.