
Sina Weibo researchers published a paper claiming their 3-billion-parameter VibeThinker-3B model matches or exceeds reasoning performance of much larger systems from Google DeepMind, OpenAI, Anthropic, and DeepSeek. VibeThinker-3B scored 94.3 on the AIME 2026 math exam. The claim has sparked debate over benchmark validity.
Tap to vote and see what everyone thinks.
Summary by ByteBrief
New benchmark tests AI models against Russian propaganda