Google releases Gemma 4 12B, a 12B parameter multimodal model with a unified encoder-free architecture. It uses a single decoder-only transformer sharing structure with Gemma 4 31B and includes 150M and 550M parameter vision models for edge and medium sizes. The model supports automatic speech recognition, diarization, video understanding, coding, and native audio inputs. Gemma 4 models have reached 150 million downloads through developer community use.
Tap to vote and see what everyone thinks.
[AINews] NVIDIA Cosmos 3, Nemotron 3 Ultra, and RTX Spark
Summary by ByteBrief