Retain only unique/distinct rows from an input tbl. This is similar to unique.data.frame(), but considerably faster.

distinct(.data, ..., .keep_all = FALSE)

Arguments

.data

a tbl

...

Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables.

.keep_all

If TRUE, keep all variables in .data. If a combination of ... is not distinct, this keeps the first row of values.

Examples

df <- tibble( x = sample(10, 100, rep = TRUE), y = sample(10, 100, rep = TRUE) ) nrow(df)
#> [1] 100
nrow(distinct(df))
#> [1] 67
nrow(distinct(df, x, y))
#> [1] 67
distinct(df, x)
#> # A tibble: 10 x 1 #> x #> <int> #> 1 6 #> 2 8 #> 3 4 #> 4 9 #> 5 2 #> 6 1 #> 7 10 #> 8 7 #> 9 5 #> 10 3
distinct(df, y)
#> # A tibble: 10 x 1 #> y #> <int> #> 1 7 #> 2 2 #> 3 3 #> 4 9 #> 5 6 #> 6 4 #> 7 8 #> 8 1 #> 9 5 #> 10 10
# Can choose to keep all other variables as well distinct(df, x, .keep_all = TRUE)
#> # A tibble: 10 x 2 #> x y #> <int> <int> #> 1 6 7 #> 2 8 2 #> 3 4 3 #> 4 9 9 #> 5 2 3 #> 6 1 1 #> 7 10 1 #> 8 7 3 #> 9 5 2 #> 10 3 9
distinct(df, y, .keep_all = TRUE)
#> # A tibble: 10 x 2 #> x y #> <int> <int> #> 1 6 7 #> 2 8 2 #> 3 4 3 #> 4 9 9 #> 5 8 6 #> 6 2 4 #> 7 4 8 #> 8 1 1 #> 9 2 5 #> 10 9 10
# You can also use distinct on computed variables distinct(df, diff = abs(x - y))
#> # A tibble: 10 x 1 #> diff #> <int> #> 1 1 #> 2 6 #> 3 0 #> 4 2 #> 5 3 #> 6 5 #> 7 4 #> 8 9 #> 9 7 #> 10 8
# The same behaviour applies for grouped data frames # except that the grouping variables are always included df <- tibble( g = c(1, 1, 2, 2), x = c(1, 1, 2, 1) ) %>% group_by(g) df %>% distinct()
#> # A tibble: 3 x 2 #> # Groups: g [2] #> g x #> <dbl> <dbl> #> 1 1 1 #> 2 2 2 #> 3 2 1
df %>% distinct(x)
#> # A tibble: 3 x 2 #> # Groups: g [2] #> g x #> <dbl> <dbl> #> 1 1 1 #> 2 2 2 #> 3 2 1