
A new benchmark called SWE-Explore reveals AI coding agents often find the correct file but fail to identify the specific lines causing a bug. The dataset draws from 848 problems across 203 open-source projects. Traditional keyword search barely beats chance, exposing a hidden weakness in current AI coding evaluation.
Tap to vote and see what everyone thinks.
Summary by ByteBrief
Turn Datadog findings into automated code fixes with Bits Code