Working with categorical data in R without losing your mind - Amelia McNamara

Categorical data, called “factor” data in R, presents unique challenges in data wrangling. R users often look down at tools like Excel for automatically coercing variables to incorrect datatypes, but factor data in R can produce very similar issues. The stringsAsFactors=HELLNO movement and standard tidyverse defaults have moved us away from the use of factors, but they are sometimes still necessary for analysis. This talk will outline common problems arising from categorical variable transformations in R, and show strategies to avoid them, using both base R and the tidyverse (particularly, dplyr and forcats functions).

(related paper from the DSS collection)

About the Author

Amelia McNamara

My work is focused on creating better tools for novices to use for data analysis. I have a theory about what the future of statistical programming should look like, and am working on next steps toward those tools. For more on that, see my dissertation. My research interests include statistics education, statistical computing, data visualization, and spatial statistics. At the moment, I am very interested in the effects of parameter choices on data analysis, particularly data visualizations. My collaborator Aran Lunzer and I have produced an interactive essay on histograms, and an initial foray into the effects of spatial aggregation. I talked more about spatial aggregation in my 2017 OpenVisConf talk, How Spatial Polygons Shape Our World.

Follow on Twitter Follow on Linkedin Visit Website More Content by Amelia McNamara
Previous Video
3D mapping, plotting, and printing with rayshader - Tyler Morgan-Wall
3D mapping, plotting, and printing with rayshader - Tyler Morgan-Wall

Next Video
Ursa Labs and Apache Arrow in 2019 - Wes McKinney
Ursa Labs and Apache Arrow in 2019 - Wes McKinney