I fell in love with SWE-bench the moment I saw it. What a great idea: have AIs solve real Issues from popular GitHub repos. SWE-bench felt more difficult, grounded, and relevant than other AI benchmarks. But I’ve always wondered how the leaderboard would change if the test set weren’t public. So for this competition we will collect a new test set after the submission deadline.
I also believe in the power of open source communities, so for this competition cash will only be awarded to submissions that use open source code and open weight models.
Automating this task will let human software engineers spend lots more time designing new features, reforming abstractions, interfacing with users, and other tasks that are more inherently human (and, for many of us, more fun). If we get this right, we can spend less time fixing bugs and more time building.
Now let’s get AI actually solving our GitHub issues.
Awards:-
TOTAL PRIZE FUND: $1,225,000
Leaderboard Prizes for Top-Ranking Teams in this Competition:
1st Place: $50,000
2nd Place: $20,000
3rd Place: $10,000
4th Place: $10,000
5th Place: $10,000
Threshold Prizes for Leaderboard Prize Winners:
If any team in the top 5 places on the leaderboard reaches a score of 30%, an additional pool of $50,000 will be distributed among the winning teams reaching the threshold, in direct proportion to their Leaderboard Prize winnings. For example, if only the first and second place teams reach the 30% threshold they would receive roughly $35,700 and $14,300 respectively. This also applies to the score thresholds of 40%, 50%, 60%, 70% 80%, and 90%.
Grand Prize:
If the first place team reaches a score of 90% they will receive an additional $775,000. The Grand Prize will bring the first place team’s total winnings to one million dollars.
Deadline:- 05-03-2025