See join for a description of the general purpose of the functions.

# S3 method for tbl_df
inner_join(x, y, by = NULL, copy = FALSE,
  suffix = c(".x", ".y"), ...,
  na_matches = pkgconfig::get_config("dplyr::na_matches"))

# S3 method for tbl_df
left_join(x, y, by = NULL, copy = FALSE, suffix = c(".x",
  ".y"), ..., na_matches = pkgconfig::get_config("dplyr::na_matches"))

# S3 method for tbl_df
right_join(x, y, by = NULL, copy = FALSE,
  suffix = c(".x", ".y"), ...,
  na_matches = pkgconfig::get_config("dplyr::na_matches"))

# S3 method for tbl_df
full_join(x, y, by = NULL, copy = FALSE, suffix = c(".x",
  ".y"), ..., na_matches = pkgconfig::get_config("dplyr::na_matches"))

# S3 method for tbl_df
semi_join(x, y, by = NULL, copy = FALSE, ...,
  na_matches = pkgconfig::get_config("dplyr::na_matches"))

# S3 method for tbl_df
anti_join(x, y, by = NULL, copy = FALSE, ...,
  na_matches = pkgconfig::get_config("dplyr::na_matches"))

Arguments

x

tbls to join

y

tbls to join

by

a character vector of variables to join by. If NULL, the default, *_join() will do a natural join, using all variables with common names across the two tables. A message lists the variables so that you can check they're right (to suppress the message, simply explicitly list the variables that you want to join).

To join by different variables on x and y use a named vector. For example, by = c("a" = "b") will match x.a to y.b.

copy

If x and y are not from the same data source, and copy is TRUE, then y will be copied into the same src as x. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.

suffix

If there are non-joined duplicate variables in x and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.

...

included for compatibility with the generic; otherwise ignored.

na_matches

Use "never" to always treat two NA or NaN values as different, like joins for database sources, similarly to merge(incomparables = FALSE). The default, "na", always treats two NA or NaN values as equal, like merge(). Users and package authors can change the default behavior by calling pkgconfig::set_config("dplyr::na_matches" = "never").

Examples

if (require("Lahman")) { batting_df <- tbl_df(Batting) person_df <- tbl_df(Master) uperson_df <- tbl_df(Master[!duplicated(Master$playerID), ]) # Inner join: match batting and person data inner_join(batting_df, person_df) inner_join(batting_df, uperson_df) # Left join: match, but preserve batting data left_join(batting_df, uperson_df) # Anti join: find batters without person data anti_join(batting_df, person_df) # or people who didn't bat anti_join(person_df, batting_df) }
#> Loading required package: Lahman
#> Joining, by = "playerID"
#> Joining, by = "playerID"
#> Joining, by = "playerID"
#> Joining, by = "playerID"
#> Joining, by = "playerID"
#> # A tibble: 190 x 26 #> playerID birthYear birthMonth birthDay birthCountry birthState birthCity #> <chr> <int> <int> <int> <chr> <chr> <chr> #> 1 actama99 1969 1 11 D.R. San Pedro … San Pedro… #> 2 adairbi99 1913 2 10 USA AL Mobile #> 3 armoubi99 1869 9 3 USA PA Homestead #> 4 bancrfr99 1846 5 9 USA MA Lancaster #> 5 barlial99 1915 4 2 USA IL Springfie… #> 6 barroed99 1868 5 10 USA IL Springfie… #> 7 bellco99 1903 5 17 USA MS Starkville #> 8 bevinte99 1956 7 7 USA OH Akron #> 9 bezdehu99 1883 4 1 Czech Republ… <NA> Prague #> 10 bicke99 1848 NA NA USA DC Washington #> # ... with 180 more rows, and 19 more variables: deathYear <int>, #> # deathMonth <int>, deathDay <int>, deathCountry <chr>, deathState <chr>, #> # deathCity <chr>, nameFirst <chr>, nameLast <chr>, nameGiven <chr>, #> # weight <int>, height <int>, bats <fct>, throws <fct>, debut <chr>, #> # finalGame <chr>, retroID <chr>, bbrefID <chr>, deathDate <date>, #> # birthDate <date>