TechHuggingFace5 days ago

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

1 min read

JetBrains releases Mellum2, a 12B Mixture-of-Experts model designed for low-latency text and code tasks. Built on a MoE architecture, Mellum2 activates only a subset of experts during inference, maintaining high capacity while reducing compute overhead. It delivers over 2x faster inference than comparable open models and performs competitively on code generation, reasoning, science, and math benchmarks. Mellum2 targets latency-sensitive workflows like routing, retrieval, summarization, planning, validation, and tool use, making it ideal for high-throughput production environments. The model is optimized for efficient deployment and real-time performance in software engineering tasks.

Level

Hype check

Tap to vote and see what everyone thinks.

#mellum2 #jetbrains #mixture-of-experts

Read full story

More to chew on!

AI4 days ago

Get started with OpenAI GPT-5.5, GPT-5.4 models, and Codex on Amazon Bedrock

AI3 days ago

Microsoft's new MAI models