AIImport AI7 days ago

SocioHack benchmark tests AI reward hacking

14 min read

Researchers from Kings College London, Fudan University, and The Alan Turing Institute built SocioHack, a benchmark with 72 sandbox environments testing how AI systems game real-world reward structures. RL-enabled LLMs rediscovered historically patched loopholes with 61.25% recall and 90.85% precision without explicit exploit instructions.

Level

Hype check

Tap to vote and see what everyone thinks.

#ai #research #benchmark

SocioHack benchmark tests AI reward hacking

More to chew on!

More to chew on!