Putting the Fun in Functional Data: A tidy pipeline to identify routes in NFL tracking data

Currently in football many hours are spent watching game film to manually label the routes run on passing plays.

Putting the Fun in Functional Data: A tidy pipeline to identify routes in NFL tracking data

January 30, 2020

Currently in football many hours are spent watching game film to manually label the routes run on passing plays. Using tracking data, each route can be described as a sequence of spatial-temporal measurements that varies in length depending on the duration of the play. This data can be conveniently analyzed using nested columns in tidyr and purrr. We demonstrate how model-based curve clustering using Bernstein polynomial basis functions (i.e. Bézier curves) fit using the Expectation Maximization algorithm can cluster route trajectories. Each cluster can then be labelled to obtain route names for each route and create route trees for all receivers. The clusters and routes can be visualized nicely using ggplot and seen developing over time using gganimate.

About the speaker

Dani Chu is a second year masters student in statistics at Simon Fraser University. He is planning on graduating in December of 2019 and will be joining the Hockey Reserach & Development team with NHL Seattle in January. Recently, he completed an internship at the NBA department of Basketball Strategy & Analytics. At SFU, he’s the co-president of the SFU Sports Analytics Club with Lucas Wu and Matthew Reyers. Along with Lucas, Matt and James Thomson, he was the winner of the College Division of the 2019 NFL Big Data Bowl and the 2018 Sacramento Kings Case Competition. Dani has also interned as a statistician at Best Buy Canada and Fraser Health Authority.