Xiaomi and TileRT achieve 1200 tokens per second on a 1-trillion-parameter model using MiMo-V2.5-Pro-UltraSpeed. The result runs on a standard 8-GPU commodity node with FP4 quantization, DFlash speculative decoding, and TileRT system optimization. UltraSpeed boosts decode speed without reducing output quality through coordinated model and serving system design.
Tap to vote and see what everyone thinks.
How MoEngage Achieved Millisecond Personalization with ScyllaDB
Summary by ByteBrief