Data Science is threatened by a looming credibility crisis: too many scientific results are not reproducible. Unfortunately, data scientists have accidentally contributed to the problem. We made science look like math, implying that one can prove scientific results (p < 0.05) without reproducing them. We need to adopt a new standard of reproducibility, one that encompasses the data, code, and decisions that underly scientific work. This change will be a windfall to commercial data scientists because code-based reproducibility is repeatable, automatable, parameterizable, and schedulable.
Garrett is the author of Hands-On Programming with R and co-author of R for Data Science and R Markdown: The Definitive Guide. He is a Data Scientist at RStudio and holds
a Ph.D. in Statistics, but specializes in teaching. He’s taught people how to use R at over 50 government agencies, small businesses, and multi-billion dollar global
companies; and he’s designed RStudio’s training materials for R, Shiny, R Markdown and more. Garrett wrote the popular lubridate package for dates and times in R and
creates the RStudio cheatsheets.