AIThe Decoderabout 13 hours ago

New open-source voice model responds every 0.4 seconds

1 min read

A three-billion-parameter model from China, Hong Kong, and Singapore listens continuously to audio streams and decides every 0.4 seconds whether to output <silent> or <response>. It processes translation, transcription, and real-time reactions in one system using 0.4-second audio chunks. The model scores 58.15 on MMAU, outperforming Qwen2.5-Omni-3B and matching smaller 7B models in English-Chinese translation. Training data was built with scene-based events and generated audio clips from tools like AudioX and ElevenLabs.

Level

Hype check

Tap to vote and see what everyone thinks.

#audio-ai #real-time #open-source

Read full story

More to chew on!

AIabout 11 hours ago

Gemma 4 12B: The Developer Guide

AIabout 12 hours ago

Google Tensor SDK Beta with LiteRT