ByteBrief

Best read upright.

We're a portrait publication through and through. Turn your phone back and your briefing picks up right where you left it.

(We tried widescreen once. It wasn't us.)

ByteBrief

ChipsHacker Newsabout 5 hours ago

GLM5.2 on AMD MI355X at 2626 tok/s/node at over 2x lower cost than Blackwell

1 min read

Wafer deployed GLM-5.2 quantized to MXFP4 using the sglang framework and Quark tooling on an AMD Instinct MI355X GPU cluster. Engineers fixed ROCm compatibility issues by adjusting module prefixes for shared experts and adding a CUDA guard for deep speculative decode kernels. The optimized stack achieved 2626 tokens per second per node at two point four requests per second with under five seconds time to first token on a twenty thousand input workload. This configuration delivers over twice the performance of Blackwell hardware while costing less than half as much.

Level

Hype check

Tap to vote and see what everyone thinks.

In this storyAMD

ByteBrief

ChipsHacker Newsabout 5 hours ago

GLM5.2 on AMD MI355X at 2626 tok/s/node at over 2x lower cost than Blackwell

1 min read

Level

Hype check

Tap to vote and see what everyone thinks.

In this storyAMD