
MIT and Harvard researchers improved AI agents' questioning in a modified Battleship game by training them on human interactions. Their Collaborative Battleship game uses natural language questions and real-time answers to evaluate how models weigh options. Smaller models like Llama 4 Scout outperformed leading LMs such as GPT-5 in fewer turns. The BattleshipQA dataset was built from over 40 human sessions playing the game together.
Tap to vote and see what everyone thinks.
Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate
Summary by ByteBrief