GitHub published the Multilingual Repositories Dataset, a metadata dataset covering over 80 million classification rows across more than 40 million public repositories. The dataset helps researchers and developers find repositories with non-English natural-language content. Portuguese tops non-English READMEs with over 3 million repositories. Korean is the most common non-English language in issue text.
Tap to vote and see what everyone thinks.
Summary by ByteBrief
Give GitHub Copilot CLI real code intelligence with language servers