Introducing SWE-bench Verified

1 year ago 1
Add to circle
We’re releasing a human-validated subset of SWE-bench that more reliably evaluates AI models’ ability to solve real-world software issues.
Read Entire Article