Select a subset of columns — pick • dplyr

pick() provides a way to easily select a subset of columns from your data using select() semantics while inside a "data-masking" function like mutate() or summarise(). pick() returns a data frame containing the selected columns for the current group.

pick() is complementary to across():

With pick(), you typically apply a function to the full data frame.
With across(), you typically apply a function to each column.

Usage

pick(...)

Arguments

...

<tidy-select>

Columns to pick.

You can't pick grouping columns because they are already automatically handled by the verb (i.e. summarise() or mutate()).

Value

A tibble containing the selected columns for the current group.

Details

Theoretically, pick() is intended to be replaceable with an equivalent call to tibble(). For example, pick(a, c) could be replaced with tibble(a = a, c = c), and pick(everything()) on a data frame with cols a, b, and c could be replaced with tibble(a = a, b = b, c = c). pick() specially handles the case of an empty selection by returning a 1 row, 0 column tibble, so an exact replacement is more like:

size <- vctrs::vec_size_common(..., .absent = 1L)
out <- vctrs::vec_recycle_common(..., .size = size)
tibble::new_tibble(out, nrow = size)

See also

Examples

df <- tibble(
  x = c(3, 2, 2, 2, 1),
  y = c(0, 2, 1, 1, 4),
  z1 = c("a", "a", "a", "b", "a"),
  z2 = c("c", "d", "d", "a", "c")
)
df
#> # A tibble: 5 × 4
#>       x     y z1    z2   
#>   <dbl> <dbl> <chr> <chr>
#> 1     3     0 a     c    
#> 2     2     2 a     d    
#> 3     2     1 a     d    
#> 4     2     1 b     a    
#> 5     1     4 a     c    

# `pick()` provides a way to select a subset of your columns using
# tidyselect. It returns a data frame.
df %>% mutate(cols = pick(x, y))
#> # A tibble: 5 × 5
#>       x     y z1    z2    cols$x    $y
#>   <dbl> <dbl> <chr> <chr>  <dbl> <dbl>
#> 1     3     0 a     c          3     0
#> 2     2     2 a     d          2     2
#> 3     2     1 a     d          2     1
#> 4     2     1 b     a          2     1
#> 5     1     4 a     c          1     4

# This is useful for functions that take data frames as inputs.
# For example, you can compute a joint rank between `x` and `y`.
df %>% mutate(rank = dense_rank(pick(x, y)))
#> # A tibble: 5 × 5
#>       x     y z1    z2     rank
#>   <dbl> <dbl> <chr> <chr> <int>
#> 1     3     0 a     c         4
#> 2     2     2 a     d         3
#> 3     2     1 a     d         2
#> 4     2     1 b     a         2
#> 5     1     4 a     c         1

# `pick()` is also useful as a bridge between data-masking functions (like
# `mutate()` or `group_by()`) and functions with tidy-select behavior (like
# `select()`). For example, you can use `pick()` to create a wrapper around
# `group_by()` that takes a tidy-selection of columns to group on. For more
# bridge patterns, see
# https://rlang.r-lib.org/reference/topic-data-mask-programming.html#bridge-patterns.
my_group_by <- function(data, cols) {
  group_by(data, pick({{ cols }}))
}

df %>% my_group_by(c(x, starts_with("z")))
#> # A tibble: 5 × 4
#> # Groups:   x, z1, z2 [4]
#>       x     y z1    z2   
#>   <dbl> <dbl> <chr> <chr>
#> 1     3     0 a     c    
#> 2     2     2 a     d    
#> 3     2     1 a     d    
#> 4     2     1 b     a    
#> 5     1     4 a     c    

# Or you can use it to dynamically select columns to `count()` by
df %>% count(pick(starts_with("z")))
#> # A tibble: 3 × 3
#>   z1    z2        n
#>   <chr> <chr> <int>
#> 1 a     c         2
#> 2 a     d         2
#> 3 b     a         1