
Google DeepMind released DiffusionGemma, an open model that generates text in parallel blocks instead of one token at a time. NVIDIA optimized it to run faster across GeForce RTX GPUs, RTX PRO platforms, and DGX Spark systems. The model uses a diffusion approach to produce up to 256 tokens per step, reducing latency for single-user workloads.
Tap to vote and see what everyone thinks.