
Researchers at MIT CSAIL and Harvard SEAS trained AI models on BattleshipQA data to improve question-asking in a guessing game. Top LMs like GPT-5 outperform humans in game completion with fewer turns while smaller models like Llama 4 Scout show poor rationality. The AI learns to form better questions by analyzing human interactions in the Collaborative Battleship game. This advancement helps AI systems gather more useful information in uncertain environments.
Tap to vote and see what everyone thinks.
New AI Benchmarks Are Testing Consistency Instead of Memorization
Summary by ByteBrief