Kaggle in the classroom: using R and GitHub to run predictive modeling competitions

rstudio::conf 2018 teaching

Colin Rundel

February 28, 2018

Kaggle is a popular platform that enables companies and researchers to host predictive modeling competitions open to analysts, statisticians, and data scientists all over the world. These type of predictive modeling contests are compelling as a pedagogical exercise as they allow students to engage with real data and provide automatic feedback on performance in both an absolute (e.g. rMSPE) and relative (vs. other teams) way. While Kaggle offers tools that allow instructors to run their own competitions on the platform this is somewhat limiting and requires students to learn how to use Kaggle’s platform. To this end, we have implemented similar predictive modeling competitions in our Statistical Computing and Predictive Modeling courses using R, GitHub, and the Wercker continuous integration platform. We will discuss the implementation details of these contests as well as how these tools provided the necessary flexibility to tackle interesting modeling tasks while complementing the skills / tools students were already learning.

About the speaker

Colin Rundel

Colin is a lecturer in Statistics and Data Science at the University of Edinburgh. He has been teaching statistics and data science courses, with a focus on computing and spatial modeling, for the last 8 years.