
Google DeepMind released DiffusionGemma, a new open model that generates text in parallel blocks rather than token by token. The 26-billion-parameter Mixture of Experts model activates only 3.8 billion parameters during inference. On an Nvidia H100, it produces over 1,000 tokens per second, roughly four times faster than similarly sized autoregressive Gemma models.
Tap to vote and see what everyone thinks.