
A study by Anthropic and Stanford finds larger language models learn rare tasks because frequent tasks dominate training dynamics. Small models fail to retain rare skills due to update-and-forget loops where frequent tasks overwrite rare task signals. Models with N neurons prioritize the N most useful features based on task frequency and importance. Only large models reach mastery of tasks making up 0.25 percent of training data. Once frequent tasks are mastered, capacity shifts to rare tasks allowing stable learning.
Tap to vote and see what everyone thinks.
Huawei posts trains DeepSeek V4-Pro with 1,000 Ascend 910C chips
Summary by ByteBrief