Gemma-2-2B and Gemma-3-1B models were transformed into reasoning models using Tunix and Kaggle TPU v5e-8. Winners used supervised fine-tuning, SimPO, and GRPO with rubric-based LLM-as-judge rewards. A 9-hour pipeline produced structured reasoning in 1B and 2B models via curriculum-guided GRPO. The winning methods applied ethical reasoning frameworks like IDEA-E and achieved high performance on industry tasks.
Tap to vote and see what everyone thinks.
xAI trained coding models on Claude outputs
Summary by ByteBrief