Apply a function (or functions) across multiple columns

across() makes it easy to apply the same transformation to multiple columns, allowing you to use select() semantics inside in "data-masking" functions like summarise() and mutate(). See vignette("colwise") for more details.

if_any() and if_all() apply the same predicate function to a selection of columns and combine the results into a single logical vector: if_any() is TRUE when the predicate is TRUE for any of the selected columns, if_all() is TRUE when the predicate is TRUE for all selected columns.

If you just need to select columns without applying a transformation to each of them, then you probably want to use pick() instead.

across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all().

Usage

across(.cols, .fns, ..., .names = NULL, .unpack = FALSE)

if_any(.cols, .fns, ..., .names = NULL)

if_all(.cols, .fns, ..., .names = NULL)

Arguments

.cols

<tidy-select> Columns to transform. You can't select grouping columns because they are already automatically handled by the verb (i.e. summarise() or mutate()).

.fns

Functions to apply to each of the selected columns. Possible values are:

A function, e.g. mean.
A purrr-style lambda, e.g. ~ mean(.x, na.rm = TRUE)
A named list of functions or lambdas, e.g. list(mean = mean, n_miss = ~ sum(is.na(.x)). Each function is applied to each column, and the output is named by combining the function name and the column name using the glue specification in .names.

Within these functions you can use cur_column() and cur_group() to access the current column and grouping keys respectively.

...

Additional arguments for the function calls in .fns are no longer accepted in ... because it's not clear when they should be evaluated: once per across() or once per group? Instead supply additional arguments directly in .fns by using a lambda. For example, instead of across(a:b, mean, na.rm = TRUE) write across(a:b, ~ mean(.x, na.rm = TRUE)).

.names

A glue specification that describes how to name the output columns. This can use {.col} to stand for the selected column name, and {.fn} to stand for the name of the function being applied. The default (NULL) is equivalent to "{.col}" for the single function case and "{.col}_{.fn}" for the case where a list is used for .fns.

.unpack

Optionally unpack data frames returned by functions in .fns, which expands the df-columns out into individual columns, retaining the number of rows in the data frame.

If FALSE, the default, no unpacking is done.
If TRUE, unpacking is done with a default glue specification of "{outer}_{inner}".
Otherwise, a single glue specification can be supplied to describe how to name the unpacked columns. This can use {outer} to refer to the name originally generated by .names, and {inner} to refer to the names of the data frame you are unpacking.

Value

across() typically returns a tibble with one column for each column in .cols and each function in .fns. If .unpack is used, more columns may be returned depending on how the results of .fns are unpacked.

if_any() and if_all() return a logical vector.

Details

When there are no selected columns:

if_any() will return FALSE, consistent with the behavior of any() when called without inputs.
if_all() will return TRUE, consistent with the behavior of all() when called without inputs.

Timing of evaluation

R code in dplyr verbs is generally evaluated once per group. Inside across() however, code is evaluated once for each combination of columns and groups. If the evaluation timing is important, for example if you're generating random variables, think about when it should happen and place your code in consequence.

gdf <-
  tibble(g = c(1, 1, 2, 3), v1 = 10:13, v2 = 20:23) %>%
  group_by(g)

set.seed(1)

# Outside: 1 normal variate
n <- rnorm(1)
gdf %>% mutate(across(v1:v2, ~ .x + n))
#> # A tibble: 4 x 3
#> # Groups:   g [3]
#>       g    v1    v2
#>   <dbl> <dbl> <dbl>
#> 1     1  9.37  19.4
#> 2     1 10.4   20.4
#> 3     2 11.4   21.4
#> 4     3 12.4   22.4

# Inside a verb: 3 normal variates (ngroup)
gdf %>% mutate(n = rnorm(1), across(v1:v2, ~ .x + n))
#> # A tibble: 4 x 4
#> # Groups:   g [3]
#>       g    v1    v2      n
#>   <dbl> <dbl> <dbl>  <dbl>
#> 1     1  10.2  20.2  0.184
#> 2     1  11.2  21.2  0.184
#> 3     2  11.2  21.2 -0.836
#> 4     3  14.6  24.6  1.60

# Inside `across()`: 6 normal variates (ncol * ngroup)
gdf %>% mutate(across(v1:v2, ~ .x + rnorm(1)))
#> # A tibble: 4 x 3
#> # Groups:   g [3]
#>       g    v1    v2
#>   <dbl> <dbl> <dbl>
#> 1     1  10.3  20.7
#> 2     1  11.3  21.7
#> 3     2  11.2  22.6
#> 4     3  13.5  22.7

Examples

# For better printing
iris <- as_tibble(iris)

# across() -----------------------------------------------------------------
# Different ways to select the same set of columns
# See <https://tidyselect.r-lib.org/articles/syntax.html> for details
iris %>%
  mutate(across(c(Sepal.Length, Sepal.Width), round))
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1            5           4          1.4         0.2 setosa 
#>  2            5           3          1.4         0.2 setosa 
#>  3            5           3          1.3         0.2 setosa 
#>  4            5           3          1.5         0.2 setosa 
#>  5            5           4          1.4         0.2 setosa 
#>  6            5           4          1.7         0.4 setosa 
#>  7            5           3          1.4         0.3 setosa 
#>  8            5           3          1.5         0.2 setosa 
#>  9            4           3          1.4         0.2 setosa 
#> 10            5           3          1.5         0.1 setosa 
#> # ℹ 140 more rows
iris %>%
  mutate(across(c(1, 2), round))
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1            5           4          1.4         0.2 setosa 
#>  2            5           3          1.4         0.2 setosa 
#>  3            5           3          1.3         0.2 setosa 
#>  4            5           3          1.5         0.2 setosa 
#>  5            5           4          1.4         0.2 setosa 
#>  6            5           4          1.7         0.4 setosa 
#>  7            5           3          1.4         0.3 setosa 
#>  8            5           3          1.5         0.2 setosa 
#>  9            4           3          1.4         0.2 setosa 
#> 10            5           3          1.5         0.1 setosa 
#> # ℹ 140 more rows
iris %>%
  mutate(across(1:Sepal.Width, round))
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1            5           4          1.4         0.2 setosa 
#>  2            5           3          1.4         0.2 setosa 
#>  3            5           3          1.3         0.2 setosa 
#>  4            5           3          1.5         0.2 setosa 
#>  5            5           4          1.4         0.2 setosa 
#>  6            5           4          1.7         0.4 setosa 
#>  7            5           3          1.4         0.3 setosa 
#>  8            5           3          1.5         0.2 setosa 
#>  9            4           3          1.4         0.2 setosa 
#> 10            5           3          1.5         0.1 setosa 
#> # ℹ 140 more rows
iris %>%
  mutate(across(where(is.double) & !c(Petal.Length, Petal.Width), round))
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1            5           4          1.4         0.2 setosa 
#>  2            5           3          1.4         0.2 setosa 
#>  3            5           3          1.3         0.2 setosa 
#>  4            5           3          1.5         0.2 setosa 
#>  5            5           4          1.4         0.2 setosa 
#>  6            5           4          1.7         0.4 setosa 
#>  7            5           3          1.4         0.3 setosa 
#>  8            5           3          1.5         0.2 setosa 
#>  9            4           3          1.4         0.2 setosa 
#> 10            5           3          1.5         0.1 setosa 
#> # ℹ 140 more rows

# Using an external vector of names
cols <- c("Sepal.Length", "Petal.Width")
iris %>%
  mutate(across(all_of(cols), round))
#> # A tibble: 150 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1            5         3.5          1.4           0 setosa 
#>  2            5         3            1.4           0 setosa 
#>  3            5         3.2          1.3           0 setosa 
#>  4            5         3.1          1.5           0 setosa 
#>  5            5         3.6          1.4           0 setosa 
#>  6            5         3.9          1.7           0 setosa 
#>  7            5         3.4          1.4           0 setosa 
#>  8            5         3.4          1.5           0 setosa 
#>  9            4         2.9          1.4           0 setosa 
#> 10            5         3.1          1.5           0 setosa 
#> # ℹ 140 more rows

# If the external vector is named, the output columns will be named according
# to those names
names(cols) <- tolower(cols)
iris %>%
  mutate(across(all_of(cols), round))
#> # A tibble: 150 × 7
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#>  1          5.1         3.5          1.4         0.2 setosa 
#>  2          4.9         3            1.4         0.2 setosa 
#>  3          4.7         3.2          1.3         0.2 setosa 
#>  4          4.6         3.1          1.5         0.2 setosa 
#>  5          5           3.6          1.4         0.2 setosa 
#>  6          5.4         3.9          1.7         0.4 setosa 
#>  7          4.6         3.4          1.4         0.3 setosa 
#>  8          5           3.4          1.5         0.2 setosa 
#>  9          4.4         2.9          1.4         0.2 setosa 
#> 10          4.9         3.1          1.5         0.1 setosa 
#> # ℹ 140 more rows
#> # ℹ 2 more variables: sepal.length <dbl>, petal.width <dbl>

# A purrr-style formula
iris %>%
  group_by(Species) %>%
  summarise(across(starts_with("Sepal"), ~ mean(.x, na.rm = TRUE)))
#> # A tibble: 3 × 3
#>   Species    Sepal.Length Sepal.Width
#>   <fct>             <dbl>       <dbl>
#> 1 setosa             5.01        3.43
#> 2 versicolor         5.94        2.77
#> 3 virginica          6.59        2.97

# A named list of functions
iris %>%
  group_by(Species) %>%
  summarise(across(starts_with("Sepal"), list(mean = mean, sd = sd)))
#> # A tibble: 3 × 5
#>   Species    Sepal.Length_mean Sepal.Length_sd Sepal.Width_mean
#>   <fct>                  <dbl>           <dbl>            <dbl>
#> 1 setosa                  5.01           0.352             3.43
#> 2 versicolor              5.94           0.516             2.77
#> 3 virginica               6.59           0.636             2.97
#> # ℹ 1 more variable: Sepal.Width_sd <dbl>

# Use the .names argument to control the output names
iris %>%
  group_by(Species) %>%
  summarise(across(starts_with("Sepal"), mean, .names = "mean_{.col}"))
#> # A tibble: 3 × 3
#>   Species    mean_Sepal.Length mean_Sepal.Width
#>   <fct>                  <dbl>            <dbl>
#> 1 setosa                  5.01             3.43
#> 2 versicolor              5.94             2.77
#> 3 virginica               6.59             2.97

iris %>%
  group_by(Species) %>%
  summarise(
    across(
      starts_with("Sepal"),
      list(mean = mean, sd = sd),
      .names = "{.col}.{.fn}"
    )
  )
#> # A tibble: 3 × 5
#>   Species    Sepal.Length.mean Sepal.Length.sd Sepal.Width.mean
#>   <fct>                  <dbl>           <dbl>            <dbl>
#> 1 setosa                  5.01           0.352             3.43
#> 2 versicolor              5.94           0.516             2.77
#> 3 virginica               6.59           0.636             2.97
#> # ℹ 1 more variable: Sepal.Width.sd <dbl>

# If a named external vector is used for column selection, .names will use
# those names when constructing the output names
iris %>%
  group_by(Species) %>%
  summarise(across(all_of(cols), mean, .names = "mean_{.col}"))
#> # A tibble: 3 × 3
#>   Species    mean_sepal.length mean_petal.width
#>   <fct>                  <dbl>            <dbl>
#> 1 setosa                  5.01            0.246
#> 2 versicolor              5.94            1.33 
#> 3 virginica               6.59            2.03 

# When the list is not named, .fn is replaced by the function's position
iris %>%
  group_by(Species) %>%
  summarise(
    across(starts_with("Sepal"), list(mean, sd), .names = "{.col}.fn{.fn}")
  )
#> # A tibble: 3 × 5
#>   Species    Sepal.Length.fn1 Sepal.Length.fn2 Sepal.Width.fn1
#>   <fct>                 <dbl>            <dbl>           <dbl>
#> 1 setosa                 5.01            0.352            3.43
#> 2 versicolor             5.94            0.516            2.77
#> 3 virginica              6.59            0.636            2.97
#> # ℹ 1 more variable: Sepal.Width.fn2 <dbl>

# When the functions in .fns return a data frame, you typically get a
# "packed" data frame back
quantile_df <- function(x, probs = c(0.25, 0.5, 0.75)) {
  tibble(quantile = probs, value = quantile(x, probs))
}

iris %>%
  reframe(across(starts_with("Sepal"), quantile_df))
#> # A tibble: 3 × 2
#>   Sepal.Length$quantile $value Sepal.Width$quantile $value
#>                   <dbl>  <dbl>                <dbl>  <dbl>
#> 1                  0.25    5.1                 0.25    2.8
#> 2                  0.5     5.8                 0.5     3  
#> 3                  0.75    6.4                 0.75    3.3

# Use .unpack to automatically expand these packed data frames into their
# individual columns
iris %>%
  reframe(across(starts_with("Sepal"), quantile_df, .unpack = TRUE))
#> # A tibble: 3 × 4
#>   Sepal.Length_quantile Sepal.Length_value Sepal.Width_quantile
#>                   <dbl>              <dbl>                <dbl>
#> 1                  0.25                5.1                 0.25
#> 2                  0.5                 5.8                 0.5 
#> 3                  0.75                6.4                 0.75
#> # ℹ 1 more variable: Sepal.Width_value <dbl>

# .unpack can utilize a glue specification if you don't like the defaults
iris %>%
  reframe(
    across(starts_with("Sepal"), quantile_df, .unpack = "{outer}.{inner}")
  )
#> # A tibble: 3 × 4
#>   Sepal.Length.quantile Sepal.Length.value Sepal.Width.quantile
#>                   <dbl>              <dbl>                <dbl>
#> 1                  0.25                5.1                 0.25
#> 2                  0.5                 5.8                 0.5 
#> 3                  0.75                6.4                 0.75
#> # ℹ 1 more variable: Sepal.Width.value <dbl>

# This is also useful inside mutate(), for example, with a multi-lag helper
multilag <- function(x, lags = 1:3) {
  names(lags) <- as.character(lags)
  purrr::map_dfr(lags, lag, x = x)
}

iris %>%
  group_by(Species) %>%
  mutate(across(starts_with("Sepal"), multilag, .unpack = TRUE)) %>%
  select(Species, starts_with("Sepal"))
#> # A tibble: 150 × 9
#> # Groups:   Species [3]
#>    Species Sepal.Length Sepal.Width Sepal.Length_1 Sepal.Length_2
#>    <fct>          <dbl>       <dbl>          <dbl>          <dbl>
#>  1 setosa           5.1         3.5           NA             NA  
#>  2 setosa           4.9         3              5.1           NA  
#>  3 setosa           4.7         3.2            4.9            5.1
#>  4 setosa           4.6         3.1            4.7            4.9
#>  5 setosa           5           3.6            4.6            4.7
#>  6 setosa           5.4         3.9            5              4.6
#>  7 setosa           4.6         3.4            5.4            5  
#>  8 setosa           5           3.4            4.6            5.4
#>  9 setosa           4.4         2.9            5              4.6
#> 10 setosa           4.9         3.1            4.4            5  
#> # ℹ 140 more rows
#> # ℹ 4 more variables: Sepal.Length_3 <dbl>, Sepal.Width_1 <dbl>,
#> #   Sepal.Width_2 <dbl>, Sepal.Width_3 <dbl>

# if_any() and if_all() ----------------------------------------------------
iris %>%
  filter(if_any(ends_with("Width"), ~ . > 4))
#> # A tibble: 3 × 5
#>   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#>          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#> 1          5.7         4.4          1.5         0.4 setosa 
#> 2          5.2         4.1          1.5         0.1 setosa 
#> 3          5.5         4.2          1.4         0.2 setosa 
iris %>%
  filter(if_all(ends_with("Width"), ~ . > 2))
#> # A tibble: 23 × 5
#>    Sepal.Length Sepal.Width Petal.Length Petal.Width Species  
#>           <dbl>       <dbl>        <dbl>       <dbl> <fct>    
#>  1          6.3         3.3          6           2.5 virginica
#>  2          7.1         3            5.9         2.1 virginica
#>  3          6.5         3            5.8         2.2 virginica
#>  4          7.6         3            6.6         2.1 virginica
#>  5          7.2         3.6          6.1         2.5 virginica
#>  6          6.8         3            5.5         2.1 virginica
#>  7          5.8         2.8          5.1         2.4 virginica
#>  8          6.4         3.2          5.3         2.3 virginica
#>  9          7.7         3.8          6.7         2.2 virginica
#> 10          7.7         2.6          6.9         2.3 virginica
#> # ℹ 13 more rows