AIArs Technica2 days ago

Google's DiffusionGemma boosts speed 4x

2 min read

Google DeepMind released DiffusionGemma, a new open model that generates text in parallel blocks rather than token by token. The 26-billion-parameter Mixture of Experts model activates only 3.8 billion parameters during inference. On an Nvidia H100, it produces over 1,000 tokens per second, roughly four times faster than similarly sized autoregressive Gemma models.

Level

Hype check

Tap to vote and see what everyone thinks.

#google #ai #diffusiongemma

Google's DiffusionGemma boosts speed 4x

More to chew on!