
NVIDIA's Nemotron Speech team released Nemotron 3.5 ASR, a 600M-parameter streaming model that transcribes 40 language-locales in real time from a single checkpoint. The Cache-Aware FastConformer-RNNT architecture caches encoder activations to avoid reprocessing overlapping audio windows, reducing compute and latency without accuracy loss. Output includes native punctuation and capitalization, eliminating a separate restoration step. The model ships as open weights on Hugging Face under the OpenMDW-1.1 license.
Tap to vote and see what everyone thinks.
Google Releases Gemma 4 12B for Local Audio and Video Analysis
Summary by ByteBrief