Kaggle, the world’s largest data science community, released the Kaggle ML and DS Survey findings for 2022. Here are some insights from the survey:
The survey shows that India exhibited a strictly increasing trend in the number of data scientists working and residing there during the last five years. Japan was amongst other countries that have shown a rising trend, while countries like the US have shown near-stagnant growth with a hike in the number of data scientists during 2022.
Programming skills and coding infrastructures
JupyterLab remains the most widely used source-coding notebook environment, followed by Google Colab and Kaggle notebooks, replacing the traditional R Studio and MATLAB. The survey also reveals that many data scientists have actively shifted to VS Code for software development.
Machine Learning Framework
Scikit-Learn stands out as the most popular framework, followed by TensorFlow and XGBoost. While they have been on the top of data scientists’ lists, they exhibited a near-constant utility, while PyTorch has been growing steadily.
The findings include concrete numbers on the number of people working with data, trends in machine learning across industries, and the best approaches for aspiring data scientists to enter the profession. It is an intriguing example of a survey dataset because Kaggle provided all the data, not just the aggregated survey results, allowing analysts to study the data independently.
Kaggle ML and DS Survey Competition 2022
Kaggle announced a competition following the sixth annual industry-wide survey to surface a comprehensive view of the country’s machine learning and data science state.
It is initiating the annual Data Science Survey Challenge and will award US$30,000 in prizes to notebook authors who best describe a particular segment of the data science and machine learning community. The challenge is an opportunity for people to use their imagination and create a story of a group of people with whom they identify.
The submissions will be evaluated on the following:
- Composition: the narrative and the subject should be well put together, researched, and supported by data and visualizations.
- Documentation: the code and notebooks should be understandable to an ordinary reader, with adequately cited sources and a concise analysis of each step. The documentation should represent the rationale behind your story.
- Originality: the entry should be informative, thought-provoking, and non-plagiarized.
A submission must be contained in a single notebook and made public before the submission deadline to be considered valid. In addition to the Kaggle Data Science survey, participants are welcome to utilize any other datasets. For a submission to be accepted, it must be made accessible to the general public on Kaggle by the deadline.