Bridging the Gap between SQL and R - Ian Cook

January 31, 2020 Ian Cook
Like it or not, SQL is the closest thing we have to a universal language for working with structured data. Celebrating its 50th birthday in 2020, SQL today integrates with thousands of applications and has millions of users worldwide. Data analysts using SQL represent a large audience of potential R users motivated to expand their data science skills. But learning R can be frustrating for SQL users. One major frustration is the inability to directly query R data frames with SQL SELECT statements. Eager to use R for tasks that are not possible with SQL (like data visualization and machine learning), these users are dismayed to find that they must first learn an unfamiliar syntax for data manipulation. The popularity of the sqldf package (which automatically exports an R data frame into an embedded database, then runs a SQL query on it) demonstrates this frustration. But now there is a way to directly query an R data frame without moving the data out of R. In this talk, I introduce tidyquery, a new R package that runs SQL queries directly on R data frames. tidyquery is powered by dplyr and by queryparser, a new pure-R, no-dependency SQL query parser.

About the Author

Ian Cook

Ian Cook is a data science and machine learning educator at Cloudera and the creator of the Coursera course "Analyzing Big Data with SQL". He has authored and contributed to several R packages and has worked in data scientist roles at TIBCO Software and Advanced Micro Devices. Ian is a cofounder of Research Triangle Analysts, the largest data science meetup group in the Raleigh, North Carolina, area. He received a masters degree in statistics from Lehigh University.

Follow on Twitter Follow on Linkedin More Content by Ian Cook
Previous Video
List-columns in data.table: Reducing the cognitive & computational burden of complex data - Tyson S. Barrett
List-columns in data.table: Reducing the cognitive & computational burden of complex data - Tyson S. Barrett

The use of list-columns in data frames and tibbles is well documented (e.g. Bryan, 2018), providing a cogni...

Next Video
Auto-magic Package Development - Alicia Schep
Auto-magic Package Development - Alicia Schep

Vega-lite is a high-level grammar of interactive graphics implemented in Javascript; it renders interactive...