The goal of this competition is to reverse the typical direction of a generative text-to-image model: instead of generating an image from a text prompt, can you create a model which can predict the text prompt given a generated image? You will make predictions on a dataset containing a wide variety of (prompt, image)
pairs generated by Stable Diffusion 2.0, in order to understand how reversible the latent relationship is.
The popularity of text-to-image models has spurned an entire new field of prompt engineering. Part art and part unsettled science, ML practitioners and researchers are rapidly grappling with understanding the relationships between prompts and the images they generate. Is adding “4k” to a prompt the best way to make it more photographic? Do small perturbations in prompts lead to highly divergent images? How does the order of prompt keywords impact the resulting generated scene? This competition tasks you with creating a model that can reliably invert the diffusion process that generated to a given image.
In order to calculate prompt similarity in a robust way—meaning that “epic cat” is scored as similar to “majestic kitten” in spite of character-level differences—you will submit embeddings of your predicted prompts. Whether you model the embeddings directly or first predict prompts and then convert to embeddings is up to you! Good luck, and may you create “highly quality, sharp focus, intricate, detailed, in the style of unreal robust cross validation” models herein.
Awards:-
- 1st Place – $12,000
- 2nd Place – $10,000
- 3rd Place – $10,000
- 4th Place – $10,000
- 5th Place – $8,000
As a condition to being awarded a Prize, a Prize winner must provide a detailed write-up on their solution in the competition forums within 14 days of the conclusion of the competition.
Deadline:- 09-05-2023