AIGoogle Dev Blogabout 10 hours ago

Community trains Gemma models to reason with Tunix and TPUs

1 min read

Gemma-2-2B and Gemma-3-1B models were transformed into reasoning models using Tunix and Kaggle TPU v5e-8. Winners used supervised fine-tuning, SimPO, and GRPO with rubric-based LLM-as-judge rewards. A 9-hour pipeline produced structured reasoning in 1B and 2B models via curriculum-guided GRPO. The winning methods applied ethical reasoning frameworks like IDEA-E and achieved high performance on industry tasks.

Level

Hype check

Tap to vote and see what everyone thinks.

#gemma #tunix #tpu

Read full story

More to chew on!

Scienceabout 8 hours ago

Microsoft Introduces MAI-Thinking-1 at Build 2026

AIabout 11 hours ago

Huawei posts trains DeepSeek V4-Pro with 1,000 Ascend 910C chips

AIabout 11 hours ago