AISlashdotabout 4 hours ago

OpenAI Releases 750-task life science benchmark

2 min read

OpenAI released a 750-task benchmark to evaluate AI performance in real life science research. Its top model GPT-Rosalind passed only 36.1% of tasks, failing 63.9%. Results show AI struggles with complex experimental design when inputs are in natural language, not LaTeX.

Level

Hype check

Tap to vote and see what everyone thinks.

In this storyOpenAI

#openai #gpt-rosalind

#life sciences

Read full story