mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name. Variables can be removed by setting their value to NULL.

mutate(.data, ...)

# S3 method for data.frame
mutate(
  .data,
  ...,
  .keep = c("all", "used", "unused", "none"),
  .before = NULL,
  .after = NULL
)

transmute(.data, ...)

Arguments

.data

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr). See Methods, below, for more details.

...

<data-masking> Name-value pairs. The name gives the name of the column in the output.

The value can be:

  • A vector of length 1, which will be recycled to the correct length.

  • A vector the same length as the current group (or the whole data frame if ungrouped).

  • NULL, to remove the column.

  • A data frame or tibble, to create multiple columns in the output.

.keep

Experimental lifecycle This is an experimental argument that allows you to control which columns from .data are retained in the output:

  • "all", the default, retains all variables.

  • "used" keeps any variables used to make new variables; it's useful for checking your work as it displays inputs and outputs side-by-side.

  • "unused" keeps only existing variables not used to make new variables.

  • "none", only keeps grouping keys (like transmute()).

.before, .after

Experimental lifecycle <tidy-select> Optionally, control where new columns should appear (the default is to add to the right hand side). See relocate() for more details.

Value

An object of the same type as .data. The output has the following properties:

  • Rows are not affected.

  • Existing columns will be preserved according to the .keep argument. New columns will be placed according to the .before and .after arguments. If .keep = "none" (as in transmute()), the output order is determined only by ..., not the order of existing columns.

  • Columns given value NULL will be removed

  • Groups will be recomputed if a grouping variable is mutated.

  • Data frame attributes are preserved.

Useful mutate functions

Grouped tibbles

Because mutating expressions are computed within groups, they may yield different results on grouped tibbles. This will be the case as soon as an aggregating, lagging, or ranking function is involved. Compare this ungrouped mutate:

starwars %>%
  select(name, mass, species) %>%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))

With the grouped equivalent:

starwars %>%
  select(name, mass, species) %>%
  group_by(species) %>%
  mutate(mass_norm = mass / mean(mass, na.rm = TRUE))

The former normalises mass by the global average whereas the latter normalises by the averages within species levels.

Methods

These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.

Methods available in currently loaded packages:

  • mutate(): dbplyr (tbl_lazy), dplyr (data.frame, default) .

  • transmute(): dbplyr (tbl_lazy), dplyr (data.frame) .

See also

Other single table verbs: arrange(), filter(), rename(), select(), slice(), summarise()

Examples

# Newly created variables are available immediately starwars %>% select(name, mass) %>% mutate( mass2 = mass * 2, mass2_squared = mass2 * mass2 )
#> # A tibble: 87 x 4 #> name mass mass2 mass2_squared #> <chr> <dbl> <dbl> <dbl> #> 1 Luke Skywalker 77 154 23716 #> 2 C-3PO 75 150 22500 #> 3 R2-D2 32 64 4096 #> 4 Darth Vader 136 272 73984 #> 5 Leia Organa 49 98 9604 #> 6 Owen Lars 120 240 57600 #> 7 Beru Whitesun lars 75 150 22500 #> 8 R5-D4 32 64 4096 #> 9 Biggs Darklighter 84 168 28224 #> 10 Obi-Wan Kenobi 77 154 23716 #> # … with 77 more rows
# As well as adding new variables, you can use mutate() to # remove variables and modify existing variables. starwars %>% select(name, height, mass, homeworld) %>% mutate( mass = NULL, height = height * 0.0328084 # convert to feet )
#> # A tibble: 87 x 3 #> name height homeworld #> <chr> <dbl> <chr> #> 1 Luke Skywalker 5.64 Tatooine #> 2 C-3PO 5.48 Tatooine #> 3 R2-D2 3.15 Naboo #> 4 Darth Vader 6.63 Tatooine #> 5 Leia Organa 4.92 Alderaan #> 6 Owen Lars 5.84 Tatooine #> 7 Beru Whitesun lars 5.41 Tatooine #> 8 R5-D4 3.18 Tatooine #> 9 Biggs Darklighter 6.00 Tatooine #> 10 Obi-Wan Kenobi 5.97 Stewjon #> # … with 77 more rows
# Use across() with mutate() to apply a transformation # to multiple columns in a tibble. starwars %>% select(name, homeworld, species) %>% mutate(across(!name, as.factor))
#> # A tibble: 87 x 3 #> name homeworld species #> <chr> <fct> <fct> #> 1 Luke Skywalker Tatooine Human #> 2 C-3PO Tatooine Droid #> 3 R2-D2 Naboo Droid #> 4 Darth Vader Tatooine Human #> 5 Leia Organa Alderaan Human #> 6 Owen Lars Tatooine Human #> 7 Beru Whitesun lars Tatooine Human #> 8 R5-D4 Tatooine Droid #> 9 Biggs Darklighter Tatooine Human #> 10 Obi-Wan Kenobi Stewjon Human #> # … with 77 more rows
# see more in ?across # Window functions are useful for grouped mutates: starwars %>% select(name, mass, homeworld) %>% group_by(homeworld) %>% mutate(rank = min_rank(desc(mass)))
#> # A tibble: 87 x 4 #> # Groups: homeworld [49] #> name mass homeworld rank #> <chr> <dbl> <chr> <int> #> 1 Luke Skywalker 77 Tatooine 5 #> 2 C-3PO 75 Tatooine 6 #> 3 R2-D2 32 Naboo 6 #> 4 Darth Vader 136 Tatooine 1 #> 5 Leia Organa 49 Alderaan 2 #> 6 Owen Lars 120 Tatooine 2 #> 7 Beru Whitesun lars 75 Tatooine 6 #> 8 R5-D4 32 Tatooine 8 #> 9 Biggs Darklighter 84 Tatooine 3 #> 10 Obi-Wan Kenobi 77 Stewjon 1 #> # … with 77 more rows
# see `vignette("window-functions")` for more details # By default, new columns are placed on the far right. # Experimental: you can override with `.before` or `.after` df <- tibble(x = 1, y = 2) df %>% mutate(z = x + y)
#> # A tibble: 1 x 3 #> x y z #> <dbl> <dbl> <dbl> #> 1 1 2 3
df %>% mutate(z = x + y, .before = 1)
#> # A tibble: 1 x 3 #> z x y #> <dbl> <dbl> <dbl> #> 1 3 1 2
df %>% mutate(z = x + y, .after = x)
#> # A tibble: 1 x 3 #> x z y #> <dbl> <dbl> <dbl> #> 1 1 3 2
# By default, mutate() keeps all columns from the input data. # Experimental: You can override with `.keep` df <- tibble(x = 1, y = 2, a = "a", b = "b") df %>% mutate(z = x + y, .keep = "all") # the default
#> # A tibble: 1 x 5 #> x y a b z #> <dbl> <dbl> <chr> <chr> <dbl> #> 1 1 2 a b 3
df %>% mutate(z = x + y, .keep = "used")
#> # A tibble: 1 x 3 #> x y z #> <dbl> <dbl> <dbl> #> 1 1 2 3
df %>% mutate(z = x + y, .keep = "unused")
#> # A tibble: 1 x 3 #> a b z #> <chr> <chr> <dbl> #> 1 a b 3
df %>% mutate(z = x + y, .keep = "none") # same as transmute()
#> # A tibble: 1 x 1 #> z #> <dbl> #> 1 3
# Grouping ---------------------------------------- # The mutate operation may yield different results on grouped # tibbles because the expressions are computed within groups. # The following normalises `mass` by the global average: starwars %>% select(name, mass, species) %>% mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
#> # A tibble: 87 x 4 #> name mass species mass_norm #> <chr> <dbl> <chr> <dbl> #> 1 Luke Skywalker 77 Human 0.791 #> 2 C-3PO 75 Droid 0.771 #> 3 R2-D2 32 Droid 0.329 #> 4 Darth Vader 136 Human 1.40 #> 5 Leia Organa 49 Human 0.504 #> 6 Owen Lars 120 Human 1.23 #> 7 Beru Whitesun lars 75 Human 0.771 #> 8 R5-D4 32 Droid 0.329 #> 9 Biggs Darklighter 84 Human 0.863 #> 10 Obi-Wan Kenobi 77 Human 0.791 #> # … with 77 more rows
# Whereas this normalises `mass` by the averages within species # levels: starwars %>% select(name, mass, species) %>% group_by(species) %>% mutate(mass_norm = mass / mean(mass, na.rm = TRUE))
#> # A tibble: 87 x 4 #> # Groups: species [38] #> name mass species mass_norm #> <chr> <dbl> <chr> <dbl> #> 1 Luke Skywalker 77 Human 0.930 #> 2 C-3PO 75 Droid 1.08 #> 3 R2-D2 32 Droid 0.459 #> 4 Darth Vader 136 Human 1.64 #> 5 Leia Organa 49 Human 0.592 #> 6 Owen Lars 120 Human 1.45 #> 7 Beru Whitesun lars 75 Human 0.906 #> 8 R5-D4 32 Droid 0.459 #> 9 Biggs Darklighter 84 Human 1.01 #> 10 Obi-Wan Kenobi 77 Human 0.930 #> # … with 77 more rows
# Indirection ---------------------------------------- # Refer to column names stored as strings with the `.data` pronoun: vars <- c("mass", "height") mutate(starwars, prod = .data[[vars[[1]]]] * .data[[vars[[2]]]])
#> # A tibble: 87 x 15 #> name height mass hair_color skin_color eye_color birth_year sex gender #> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr> #> 1 Luke… 172 77 blond fair blue 19 male mascu… #> 2 C-3PO 167 75 NA gold yellow 112 none mascu… #> 3 R2-D2 96 32 NA white, bl… red 33 none mascu… #> 4 Dart… 202 136 none white yellow 41.9 male mascu… #> 5 Leia… 150 49 brown light brown 19 fema… femin… #> 6 Owen… 178 120 brown, gr… light blue 52 male mascu… #> 7 Beru… 165 75 brown light blue 47 fema… femin… #> 8 R5-D4 97 32 NA white, red red NA none mascu… #> 9 Bigg… 183 84 black light brown 24 male mascu… #> 10 Obi-… 182 77 auburn, w… fair blue-gray 57 male mascu… #> # … with 77 more rows, and 6 more variables: homeworld <chr>, species <chr>, #> # films <list>, vehicles <list>, starships <list>, prod <dbl>
# Learn more in ?dplyr_data_masking