ScienceThe Decoderabout 3 hours ago

Researchers pinpoint why larger language models pick up skills that small ones miss

ByteBrief summary

1 min read

A study by Anthropic and Stanford finds larger language models learn rare tasks because frequent tasks dominate training dynamics. Small models fail to retain rare skills due to update-and-forget loops where frequent tasks overwrite rare task signals. Models with N neurons prioritize the N most useful features based on task frequency and importance. Only large models reach mastery of tasks making up 0.25 percent of training data. Once frequent tasks are mastered, capacity shifts to rare tasks allowing stable learning.

Level

Hype check

Tap to vote and see what everyone thinks.

#anthropic #language-models #training-data

Read full story

More to chew on!

AIabout 4 hours ago

The Weight Of Intelligence By Satish Viswanathan

AIabout 24 hours ago

xAI trained coding models on Claude outputs

AIabout 23 hours ago