Creating and Preprocessing a Design Matrix with Recipes

June 8, 2017 Max Kuhn

Download Materials

R has an excellent framework for specifying models using formulas. While elegant and useful, it was designed in a time when models had small numbers of terms and complex preprocessing of data was not commonplace. As such, it has some limitations. In this talk, a new package called recipes is shown where the specification of model terms and preprocessing steps can be enumerated sequentially. The recipe can be estimated and applied to any dataset. Current options include simple transformations (log, Box-Cox, interactions, dummy variables, …), signal extraction (PCA, ICA, MDS), basis functions (splines, polynomials), imputation methods, and others.

About the Author

Max Kuhn

Dr. Max Kuhn is a Software Engineer at RStudio. He is the author or maintainer of several R packages for predictive modeling including caret, Cubist, C50 and others. He routinely teaches classes in predictive modeling at rstudio::conf, Predictive Analytics World, and UseR! and his publications include work on neuroscience biomarkers, drug discovery, molecular diagnostics and response surface methodology. He and Kjell Johnson wrote the award-winning book Applied Predictive Modeling in 2013.

Follow on Twitter More Content by Max Kuhn
Previous Video
What's new in dplyr 0.7.0
What's new in dplyr 0.7.0

dplyr provides a “grammar” of data transformation, making it easy and elegant to solve the most common data...

Next Video
Introducing blogdown, a new R package to make blogs and websites with R Markdown
Introducing blogdown, a new R package to make blogs and websites with R Markdown

The most typical use of R Markdown is to create a single output document from a source document, and there ...


Please register to receive regular updates on our webinars.

Thank you!
Error - something went wrong!