AIMarkTechPostabout 3 hours ago

Xiaomi and TileRT Achieve 1200 Tokens Per Second on 1-Trillion-Parameter Model

9 min read

Xiaomi and TileRT achieve 1200 tokens per second on a 1-trillion-parameter model using MiMo-V2.5-Pro-UltraSpeed. The result runs on a standard 8-GPU commodity node with FP4 quantization, DFlash speculative decoding, and TileRT system optimization. UltraSpeed boosts decode speed without reducing output quality through coordinated model and serving system design.

Level

Hype check

Tap to vote and see what everyone thinks.

#mimo #tilert #commodity-gpus

Read full story

More to chew on!

AIabout 11 hours ago

Microsoft Ships MAI-Transcribe-1.5 with 2.4% WER

AIabout 16 hours ago

Walmart sees AI promise and costs

Tech