This competition challenges you to identify exploits for an LLM-as-a-judge system designed to evaluate the quality of essays. You’ll be given a list of essay topics and your goal will be to submit an essay that maximizes disagreement between the LLM judges. Your work will help to form a better understanding of the capabilities and limitations of using LLMs for subjective evaluations tasks at scale.
It’s increasingly common to use LLMs for subjective evaluations such as ranking and scoring the quality of generated text. However, any automated rating system is vulnerable to exploits. Different models will have different degrees of self-bias, position-bias, length-bias, and style-bias that might negatively impact their ability to provide robust assessments (Zheng 2023, Wang 2023, Panickssery 2024). Likewise, different models will have different degrees of vulnerabilities to targeted exploits, such as universal jailbreaks, that can be used to misguide the system (Wallace 2021, Zou 2023, Li 2024, Rando 2024).
One method to improve the robustness of automated judging systems is to include multiple LLM models to form a LLM-judging committee. Each model is distantly related to the other, decreasing the chances of having common vulnerabilities. An advantage of LLM-judging committees is that they are less sensitive to exploits that impact only a single model. This competition attempts to answer the question of whether or not individual LLM judges can be coerced into returning inflated scores that diverge substantially from a group consensus.
By identifying exploits used to unfairly bias an evaluation in a given direction, you will help the ML community better understand the strengths and weaknesses of using AI systems to make subjective decisions at scale.
Awards:-
- 1st Place – $12,000
- 2nd Place – $10,000
- 3rd Place – $10,000
- 4th Place – $10,000
- 5th Place – $8,000
We’re excited to launch this experimental competition and want to be completely transparent about its nature. Because LLM behavior can be unpredictable, we might encounter unexpected issues that affect scoring or the competition’s overall integrity. We plan to award points and medals, but there may be course corrections along the way, and we reserve the right to remove points and medals. We’ll keep participants informed and make any necessary adjustments with ample time remaining in the competition.
Deadline:- 25-02-2025