Kaggle’s annual Machine Learning & Data Science Survey competition! You can read our executive summary here.
This year, as in 2017, 2018, and 2019 we set out to conduct an industry-wide survey that presents a truly comprehensive view of the state of data science and machine learning. The survey was live for 3.5 weeks in October, and after cleaning the data we finished with 20,036 responses!
There’s a lot to explore here. The results include raw numbers about who is working with data, what’s happening with machine learning in different industries, and the best ways for new data scientists to break into the field. We’ve published the data in as raw a format as possible without compromising anonymization, which makes it an unusual example of a survey dataset.
This year Kaggle is launching the second annual Data Science Survey Challenge, where we will be awarding a prize pool of $30,000 to notebook authors who tell a rich story about a subset of the data science and machine learning community.
In our third year running this survey, we were once again awed by the global, diverse, and dynamic nature of the data science and machine learning industry. This survey data EDA provides an overview of the industry on an aggregate scale, but it also leaves us wanting to know more about the many specific communities comprised within the survey. For that reason, we’re inviting the Kaggle community to dive deep into the survey datasets and help us tell the diverse stories of data scientists from around the world.
The challenge objective: tell a data story about a subset of the data science community represented in this survey, through a combination of both narrative text and data exploration. A “story” could be defined any number of ways, and that’s deliberate. The challenge is to deeply explore (through data) the impact, priorities, or concerns of a specific group of data science and machine learning practitioners. That group can be defined in the macro (for example: anyone who does most of their coding in Python) or the micro (for example: female data science students studying machine learning in masters programs). This is an opportunity to be creative and tell the story of a community you identify with or are passionate about!
Submissions will be evaluated on the following:
- Composition – Is there a clear narrative thread to the story that’s articulated and supported by data? The subject should be well defined, well researched, and well supported through the use of data and visualizations.
- Originality – Does the reader learn something new through this submission? Or is the reader challenged to think about something in a new way? A great entry will be informative, thought provoking, and fresh all at the same time.
- Documentation – Are your code, and notebook, and additional data sources well documented so a reader can understand what you did? Are your sources clearly cited? A high quality analysis should be concise and clear at each step so the rationale is easy to follow and the process is reproducible
To be valid, a submission must be contained in one notebook, made public on or before the submission deadline. Participants are free to use any datasets in addition to the Kaggle Data Science survey, but those datasets must also be publicly available on Kaggle by the deadline for a submission to be valid.
Do you want to stay updated about latest opportunities, scholarships & online competitions, like $30,000 Kaggle ML & DS Survey 2020 then please install our Android or iOS App here.
There will be 5 prizes for the best data storytelling submissions:
- 1st place: $10,000
- 2nd place: $8,000
- 3rd place: $6,000
- 4th place: $4,000
- 5th place: $1,000
Kaggle will also give a Notebook Award of $1,000 to recognize our favorite notebook that gets published prior to 11:59:00PM UTC on Wednesday, December 9th.