ByteBrief
We're a portrait publication through and through. Turn your phone back and your briefing picks up right where you left it.
(We tried widescreen once. It wasn't us.)

New open-weight LLM releases focus on long-context efficiency. Key architecture tricks include KV sharing and per-layer embeddings in Gemma 4, layer-wise attention budgeting in Laguna XS.2, compressed convolutional attention in ZAYA1-8B, and mHC plus compressed attention in DeepSeek V4.
Tap to vote and see what everyone thinks.
Summary by ByteBrief