Kaggle in the classroom: using R and GitHub to run predictive modeling competitions – Colin Rundel

February 28, 2018

Abstract

Kaggle is a popular platform that enables companies and researchers to host predictive modeling competitions open to analysts, statisticians, and data scientists all over the world. These type of predictive modeling contests are compelling as a pedagogical exercise as they allow students to engage with real data and provide automatic feedback on performance in both an absolute (e.g. rMSPE) and relative (vs. other teams) way. While Kaggle offers tools that allow instructors to run their own competitions on the platform this is somewhat limiting and requires students to learn how to use Kaggle’s platform. To this end, we have implemented similar predictive modeling competitions in our Statistical Computing and Predictive Modeling courses using R, GitHub, and the Wercker continuous integration platform. We will discuss the implementation details of these contests as well as how these tools provided the necessary flexibility to tackle interesting modeling tasks while complementing the skills / tools students were already learning.


About the speaker

Colin Rundel
Assistant Professor of the Practice of Statistical Science

Colin is an assistant professor of the practice in Statistical Science at Duke University. My research interests include applied spatial statistics with an emphasis on Bayesian statistics and computational methods.

Previous Video
Training an army of new data scientists – Marco Blume
Training an army of new data scientists – Marco Blume

Next Video
Data-driven product development – Ramnath Vaidyanathan
Data-driven product development – Ramnath Vaidyanathan