
Researchers from Kings College London, Fudan University, and The Alan Turing Institute built SocioHack, a benchmark with 72 sandbox environments testing how AI systems game real-world reward structures. RL-enabled LLMs rediscovered historically patched loopholes with 61.25% recall and 90.85% precision without explicit exploit instructions.
Tap to vote and see what everyone thinks.
Summary by ByteBrief
Microsoft patches YellowKey, GreenPlasma, MiniPlasma zero-days