List-columns in data.table: Reducing the cognitive & computational burden of complex data - Tyson S. Barrett

January 31, 2020 Tyson Barrett
The use of list-columns in data frames and tibbles is well documented (e.g. Bryan, 2018), providing a cognitively efficient way to organize results of complex data (e.g. several statistical models, groupings of text, data summaries, or even graphics) with corresponding data. For example, one can store student information within classrooms, player information within teams, or analyses within groups. This allows the data to be of variable sizes without overly complicating or adding redundancies to the structure of the data. In turn, this can improve the reliability to appropriately analyze the data. Because of its efficiency and speed, being able to use data.table to work with list-columns would be beneficial in many data contexts (e.g. to reduce memory usage in large data sets). Herein, I demonstrate how one can create list-columns in a data table using the by argument in data.table and purrr::map(). I compare the behavior of the data.table approaches to the dplyr::group_nest() function and tidyr::unnest(), two of the several powerful tidyverse nesting and unnesting functions. Results using bench::mark() show the speed and efficiency of using data.table to work with list-columns.

About the Author

Tyson Barrett

I am a Research Assistant Professor at Utah State University in clinical and social data science. My work addresses the research data analytic needs of the college. My emphasis is in the use of R to work with complex data sets. In addition to my own research, I consult on data science issues for the college (including public health, education, psychology, and other social sciences). In addition to the research and consulting aspects of my position, I also teach statistics and research methods courses. However, my favorite course to teach is an Intro to R course and an Intermediate R course for undergraduate and graduate students.

Follow on Twitter More Content by Tyson Barrett
Previous Video
How Rmarkdown changed my life - Rob Hyndman
How Rmarkdown changed my life - Rob Hyndman

Over the last few years, Rmarkdown seems to have taken over my life, or at least my written communication.

Next Video
Bridging the Gap between SQL and R - Ian Cook
Bridging the Gap between SQL and R - Ian Cook

Like it or not, SQL is the closest thing we have to a universal language for working with structured data. ...