rlang 0.4.0

June 26, 2019

It is with great excitement that we announce the release of rlang 0.4.0 on CRAN. rlang is a toolkit for working with core R and Tidyverse features, and hosts the tidy evaluation framework. The full set of changes can be found in the changelog.

In this article, we introduce the most important of these, the new tidy evaluation operator {{. We will use a simple dplyr pipeline as a running example, let’s start by attaching the package:

library(dplyr)

The good and bad of tidy evaluation

Tidy eval powers packages like dplyr and tidyr. It makes it possible to manipulate data frame columns as if they were defined in the workspace:

gender
#> Error in eval(expr, envir, enclos): object 'gender' not found
mass
#> Error in eval(expr, envir, enclos): object 'mass' not found

starwars %>%
  group_by(gender) %>%
  summarise(mass_maximum = max(mass, na.rm = TRUE))
#> # A tibble: 5 x 2
#>   gender        mass_maximum
#>   <chr>                <dbl>
#> 1 <NA>                    75
#> 2 female                  75
#> 3 hermaphrodite         1358
#> 4 male                   159
#> 5 none                   140

We call this syntax data masking. This feature is unique to the R language and greatly streamlines the writing and reading of code in interactive scripts. Unfortunately, it also makes it more complex to reuse common patterns inside functions:

max_by <- function(data, var, by) {
  data %>%
    group_by(by) %>%
    summarise(maximum = max(var, na.rm = TRUE))
}

starwars %>% max_by(mass, by = gender)
#> Error: Column `by` is unknown

Technically, this is because data-masked code needs to be delayed and transported to the data context. Behind the scenes, dplyr verbs achieve this by capturing the blueprint of your code, and resuming its evaluation inside the data mask. The example above fails because group_by() is capturing the wrong piece of blueprint. To solve this, tidy evaluation provides enquo() to delay the interpretation of code and capture its blueprint, and the surgery operator !! for modifying blueprints. The combination of using enquo() and !! is called the quote-and-unquote pattern:

max_by <- function(data, var, by) {
  data %>%
    group_by(!!enquo(by)) %>%
    summarise(maximum = max(!!enquo(var), na.rm = TRUE))
}

starwars %>% max_by(mass, by = gender)
#> # A tibble: 5 x 2
#>   gender        maximum
#>   <chr>           <dbl>
#> 1 <NA>               75
#> 2 female             75
#> 3 hermaphrodite    1358
#> 4 male              159
#> 5 none              140

We have come to realise that this pattern is difficult to teach and to learn because it involves a new, unfamiliar syntax, and because it introduces two new programming concepts (quote and unquote) that are hard to understand intuitively. This complexity is not really justified because this pattern is overly flexible for basic programming needs.

A simpler interpolation pattern with {{

rlang 0.4.0 provides a new operator, {{ (read: curly curly), which abstracts quote-and-unquote into a single interpolation step. The curly-curly operator should be straightforward to use. When you create a function around a tidyverse pipeline, wrap the function arguments containing data frame variables with {{:

max_by <- function(data, var, by) {
  data %>%
    group_by({{ by }}) %>%
    summarise(maximum = max({{ var }}, na.rm = TRUE))
}

starwars %>% max_by(height)
#> # A tibble: 1 x 1
#>   maximum
#>     <int>
#> 1     264

starwars %>% max_by(height, by = gender)
#> # A tibble: 5 x 2
#>   gender        maximum
#>   <chr>           <int>
#> 1 <NA>              167
#> 2 female            213
#> 3 hermaphrodite     175
#> 4 male              264
#> 5 none              200

This syntax should be reminiscent of string interpolation in the glue package by Jim Hester:

var <- sample(c("woof", "meow", "mooh"), size = 1)
glue::glue("Did you just say {var}?")
#> Did you just say mooh?

Other simple tidy evaluation patterns

There are a few existing patterns that aren’t emphasised enough in the existing documentation. We are changing our teaching strategy to focus on these simpler patterns.

  • If you would like to pass multiple arguments to a data-masking verb, pass ... directly:
  summarise_by <- function(data, ..., by) {
    data %>%
      group_by({{ by }}) %>%
      summarise(...)
  }
  
  starwars %>%
    summarise_by(
      average = mean(height, na.rm = TRUE),
      maximum = max(height, na.rm = TRUE),
      by = gender
    )
  #> # A tibble: 5 x 3
  #>   gender        average maximum
  #>   <chr>           <dbl>   <int>
  #> 1 <NA>             120      167
  #> 2 female           165.     213
  #> 3 hermaphrodite    175      175
  #> 4 male             179.     264
  #> 5 none             200      200

You only need quote-and-unquote (with the plural variants enquos() and !!!) when you need to modify the inputs or their names in some way.

  • If you have string inputs, use the .data pronoun:
  max_by <- function(data, var, by) {
    data %>%
      group_by(.data[[by]]) %>%
      summarise(maximum = max(.data[[var]], na.rm = TRUE))
  }
  
  starwars %>% max_by("height", by = "gender")
  #> # A tibble: 5 x 2
  #>   gender        maximum
  #>   <chr>           <int>
  #> 1 <NA>              167
  #> 2 female            213
  #> 3 hermaphrodite     175
  #> 4 male              264
  #> 5 none              200

The . pronoun from magrittr is not appropriate here because it represents the whole data frame, whereas .data represents the subset for the current group.

To learn more about the different ways of programming around tidyverse pipelines, we recommend reading the new programming vignette in ggplot2, written by Dewey Dunnington who is currently interning at RStudio.

Thanks!

The following people have contributed to this release by posting issues and pull requests:

@001ben, @asardaes, @BillDunlap, @burchill, @cpsievert, @DavisVaughan, @egnha, @flying-sheep, @gaborcsardi, @gaelledoucet, @GaGaMan1101, @grayskripko, @hadley, @harrysouthworth, @holgerbrandl, @IndrajeetPatil, @jazzmoe, @jennybc, @jjesusfilho, @juangomezduaso, @krlmlr, @lionel-, @Marieag, @mmuurr, @moodymudskipper, @paulponcet, @riccardopinosio, @richierocks, @RolandASc, @romainfrancois, @s-fleck, @siddharthprabhu, @subratiter1, @wch, @wetlandscapes, @wlandau, @x1o, @XWeiZhou, @yenzichun, @yonicd, and @zachary-foster

Previous Article
googledrive v1.0.0
googledrive v1.0.0

Introduction We’re jazzed to announce the release of googledrive v1.0.0 (https://googledrive.tidyverse.org...

Next Video
A Gentle Introduction to Tidy Statistics in R
A Gentle Introduction to Tidy Statistics in R

R is a fantastic language for statistical programming, but making the jump from point and click interfaces ...