
DeepSeek released a 671-billion-parameter model with a technical report in December 2024. Moonshot AI scaled the design to a trillion parameters, solved a training instability with a new optimizer, and shipped their model. Zhipu AI later integrated a DeepSeek innovation and contributed a new training framework. Open-weight models enable indirect public collaboration between competitors.
Tap to vote and see what everyone thinks.
Summary by ByteBrief
DiffusionGemma