count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()). count() is paired with tally(), a lower-level helper that is equivalent to df %>% summarise(n = n()). Supply wt to perform weighted counts, switching the summary from from n = n() to n = sum(wt).

add_count() are add_tally() are equivalents to count() and tally() but use mutate() instead of summarise() so that they add a new column with group-wise counts.

count(
  x,
  ...,
  wt = NULL,
  sort = FALSE,
  name = NULL,
  .drop = group_by_drop_default(x)
)

tally(x, wt = NULL, sort = FALSE, name = NULL)

add_count(x, ..., wt = NULL, sort = FALSE, name = NULL, .drop = deprecated())

add_tally(x, wt = NULL, sort = FALSE, name = NULL)

Arguments

x

A data frame, data frame extension (e.g. a tibble), or a lazy data frame (e.g. from dbplyr or dtplyr).

...

<data-masking> Variables to group by.

wt

<data-masking> Frequency weights. Can be a variable (or combination of variables) or NULL. wt is computed once for each unique combination of the counted variables.

  • If a variable, count() will compute sum(wt) for each unique combination.

  • If NULL, the default, the computation depends on whether a column of frequency counts n exists in the data frame. If it exists, the counts are computed with sum(n) for each unique combination. Otherwise, n() is used to compute the counts. Supply wt = n() to force this behaviour even if you have an n column in the data frame.

sort

If TRUE, will show the largest groups at the top.

name

The name of the new column in the output.

If omitted, it will default to n. If there's already a column called n, it will error, and require you to specify the name.

.drop

For count(): if FALSE will include counts for empty groups (i.e. for levels of factors that don't exist in the data). Deprecated in add_count() since it didn't actually affect the output.

Value

An object of the same type as .data. count() and add_count() group transiently, so the output has the same groups as the input.

Examples

# count() is a convenient way to get a sense of the distribution of # values in a dataset starwars %>% count(species)
#> # A tibble: 38 x 2 #> species n #> <chr> <int> #> 1 Aleena 1 #> 2 Besalisk 1 #> 3 Cerean 1 #> 4 Chagrian 1 #> 5 Clawdite 1 #> 6 Droid 6 #> 7 Dug 1 #> 8 Ewok 1 #> 9 Geonosian 1 #> 10 Gungan 3 #> # … with 28 more rows
starwars %>% count(species, sort = TRUE)
#> # A tibble: 38 x 2 #> species n #> <chr> <int> #> 1 Human 35 #> 2 Droid 6 #> 3 NA 4 #> 4 Gungan 3 #> 5 Kaminoan 2 #> 6 Mirialan 2 #> 7 Twi'lek 2 #> 8 Wookiee 2 #> 9 Zabrak 2 #> 10 Aleena 1 #> # … with 28 more rows
starwars %>% count(sex, gender, sort = TRUE)
#> # A tibble: 6 x 3 #> sex gender n #> <chr> <chr> <int> #> 1 male masculine 60 #> 2 female feminine 16 #> 3 none masculine 5 #> 4 NA NA 4 #> 5 hermaphroditic masculine 1 #> 6 none feminine 1
starwars %>% count(birth_decade = round(birth_year, -1))
#> # A tibble: 15 x 2 #> birth_decade n #> <dbl> <int> #> 1 10 1 #> 2 20 6 #> 3 30 4 #> 4 40 6 #> 5 50 8 #> 6 60 4 #> 7 70 4 #> 8 80 2 #> 9 90 3 #> 10 100 1 #> 11 110 1 #> 12 200 1 #> 13 600 1 #> 14 900 1 #> 15 NA 44
# use the `wt` argument to perform a weighted count. This is useful # when the data has already been aggregated once df <- tribble( ~name, ~gender, ~runs, "Max", "male", 10, "Sandra", "female", 1, "Susan", "female", 4 ) # counts rows: df %>% count(gender)
#> # A tibble: 2 x 2 #> gender n #> <chr> <int> #> 1 female 2 #> 2 male 1
# counts runs: df %>% count(gender, wt = runs)
#> # A tibble: 2 x 2 #> gender n #> <chr> <dbl> #> 1 female 5 #> 2 male 10
# tally() is a lower-level function that assumes you've done the grouping starwars %>% tally()
#> # A tibble: 1 x 1 #> n #> <int> #> 1 87
starwars %>% group_by(species) %>% tally()
#> # A tibble: 38 x 2 #> species n #> <chr> <int> #> 1 Aleena 1 #> 2 Besalisk 1 #> 3 Cerean 1 #> 4 Chagrian 1 #> 5 Clawdite 1 #> 6 Droid 6 #> 7 Dug 1 #> 8 Ewok 1 #> 9 Geonosian 1 #> 10 Gungan 3 #> # … with 28 more rows
# both count() and tally() have add_ variants that work like # mutate() instead of summarise df %>% add_count(gender, wt = runs)
#> # A tibble: 3 x 4 #> name gender runs n #> <chr> <chr> <dbl> <dbl> #> 1 Max male 10 10 #> 2 Sandra female 1 5 #> 3 Susan female 4 5
df %>% add_tally(wt = runs)
#> # A tibble: 3 x 4 #> name gender runs n #> <chr> <chr> <dbl> <dbl> #> 1 Max male 10 15 #> 2 Sandra female 1 15 #> 3 Susan female 4 15