These functions provide a framework for modifying rows in a table using a
second table of data. The two tables are matched by
a set of key variables
whose values typically uniquely identify each row. The functions are inspired
by SQL's INSERT
, UPDATE
, and DELETE
, and can optionally modify
in_place
for selected backends.
rows_insert()
adds new rows (likeINSERT
). By default, key values iny
must not exist inx
.rows_append()
works likerows_insert()
but ignores keys.rows_update()
modifies existing rows (likeUPDATE
). Key values iny
must be unique, and, by default, key values iny
must exist inx
.rows_patch()
works likerows_update()
but only overwritesNA
values.rows_upsert()
inserts or updates depending on whether or not the key value iny
already exists inx
. Key values iny
must be unique.rows_delete()
deletes rows (likeDELETE
). By default, key values iny
must exist inx
.
Usage
rows_insert(
x,
y,
by = NULL,
...,
conflict = c("error", "ignore"),
copy = FALSE,
in_place = FALSE
)
rows_append(x, y, ..., copy = FALSE, in_place = FALSE)
rows_update(
x,
y,
by = NULL,
...,
unmatched = c("error", "ignore"),
copy = FALSE,
in_place = FALSE
)
rows_patch(
x,
y,
by = NULL,
...,
unmatched = c("error", "ignore"),
copy = FALSE,
in_place = FALSE
)
rows_upsert(x, y, by = NULL, ..., copy = FALSE, in_place = FALSE)
rows_delete(
x,
y,
by = NULL,
...,
unmatched = c("error", "ignore"),
copy = FALSE,
in_place = FALSE
)
Arguments
- x, y
A pair of data frames or data frame extensions (e.g. a tibble).
y
must have the same columns ofx
or a subset.- by
An unnamed character vector giving the key columns. The key columns must exist in both
x
andy
. Keys typically uniquely identify each row, but this is only enforced for the key values ofy
whenrows_update()
,rows_patch()
, orrows_upsert()
are used.By default, we use the first column in
y
, since the first column is a reasonable place to put an identifier variable.- ...
Other parameters passed onto methods.
- conflict
For
rows_insert()
, how should keys iny
that conflict with keys inx
be handled? A conflict arises if there is a key iny
that already exists inx
.One of:
"error"
, the default, will error if there are any keys iny
that conflict with keys inx
."ignore"
will ignore rows iny
with keys that conflict with keys inx
.
- copy
If
x
andy
are not from the same data source, andcopy
isTRUE
, theny
will be copied into the same src asx
. This allows you to join tables across srcs, but it is a potentially expensive operation so you must opt into it.- in_place
Should
x
be modified in place? This argument is only relevant for mutable backends (e.g. databases, data.tables).When
TRUE
, a modified version ofx
is returned invisibly; whenFALSE
, a new object representing the resulting changes is returned.- unmatched
For
rows_update()
,rows_patch()
, androws_delete()
, how should keys iny
that are unmatched by the keys inx
be handled?One of:
"error"
, the default, will error if there are any keys iny
that are unmatched by the keys inx
."ignore"
will ignore rows iny
with keys that are unmatched by the keys inx
.
Value
An object of the same type as x
. The order of the rows and columns of x
is preserved as much as possible. The output has the following properties:
rows_update()
androws_patch()
preserve the number of rows;rows_insert()
,rows_append()
, androws_upsert()
return all existing rows and potentially new rows;rows_delete()
returns a subset of the rows.Columns are not added, removed, or relocated, though the data may be updated.
Groups are taken from
x
.Data frame attributes are taken from
x
.
If in_place = TRUE
, the result will be returned invisibly.
Methods
These function are generics, which means that packages can provide implementations (methods) for other classes. See the documentation of individual methods for extra arguments and differences in behaviour.
Methods available in currently loaded packages:
rows_insert()
: dbplyr (tbl_lazy
), dplyr (data.frame
) .rows_append()
: dbplyr (tbl_lazy
), dplyr (data.frame
) .rows_update()
: dbplyr (tbl_lazy
), dplyr (data.frame
) .rows_patch()
: dbplyr (tbl_lazy
), dplyr (data.frame
) .rows_upsert()
: dbplyr (tbl_lazy
), dplyr (data.frame
) .rows_delete()
: dbplyr (tbl_lazy
), dplyr (data.frame
) .
Examples
data <- tibble(a = 1:3, b = letters[c(1:2, NA)], c = 0.5 + 0:2)
data
#> # A tibble: 3 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
#> 2 2 b 1.5
#> 3 3 NA 2.5
# Insert
rows_insert(data, tibble(a = 4, b = "z"))
#> Matching, by = "a"
#> # A tibble: 4 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
#> 2 2 b 1.5
#> 3 3 NA 2.5
#> 4 4 z NA
# By default, if a key in `y` matches a key in `x`, then it can't be inserted
# and will throw an error. Alternatively, you can ignore rows in `y`
# containing keys that conflict with keys in `x` with `conflict = "ignore"`,
# or you can use `rows_append()` to ignore keys entirely.
try(rows_insert(data, tibble(a = 3, b = "z")))
#> Matching, by = "a"
#> Error in rows_insert(data, tibble(a = 3, b = "z")) :
#> `y` can't contain keys that already exist in `x`.
#> ℹ The following rows in `y` have keys that already exist in `x`: `c(1)`.
#> ℹ Use `conflict = "ignore"` if you want to ignore these `y` rows.
rows_insert(data, tibble(a = 3, b = "z"), conflict = "ignore")
#> Matching, by = "a"
#> # A tibble: 3 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
#> 2 2 b 1.5
#> 3 3 NA 2.5
rows_append(data, tibble(a = 3, b = "z"))
#> # A tibble: 4 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
#> 2 2 b 1.5
#> 3 3 NA 2.5
#> 4 3 z NA
# Update
rows_update(data, tibble(a = 2:3, b = "z"))
#> Matching, by = "a"
#> # A tibble: 3 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
#> 2 2 z 1.5
#> 3 3 z 2.5
rows_update(data, tibble(b = "z", a = 2:3), by = "a")
#> # A tibble: 3 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
#> 2 2 z 1.5
#> 3 3 z 2.5
# Variants: patch and upsert
rows_patch(data, tibble(a = 2:3, b = "z"))
#> Matching, by = "a"
#> # A tibble: 3 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
#> 2 2 b 1.5
#> 3 3 z 2.5
rows_upsert(data, tibble(a = 2:4, b = "z"))
#> Matching, by = "a"
#> # A tibble: 4 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
#> 2 2 z 1.5
#> 3 3 z 2.5
#> 4 4 z NA
# Delete and truncate
rows_delete(data, tibble(a = 2:3))
#> Matching, by = "a"
#> # A tibble: 1 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
rows_delete(data, tibble(a = 2:3, b = "b"))
#> Matching, by = "a"
#> Ignoring extra `y` columns: b
#> # A tibble: 1 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
# By default, for update, patch, and delete it is an error if a key in `y`
# doesn't exist in `x`. You can ignore rows in `y` that have unmatched keys
# with `unmatched = "ignore"`.
y <- tibble(a = 3:4, b = "z")
try(rows_update(data, y, by = "a"))
#> Error in rows_update(data, y, by = "a") :
#> `y` must contain keys that already exist in `x`.
#> ℹ The following rows in `y` have keys that don't exist in `x`: `c(2)`.
#> ℹ Use `unmatched = "ignore"` if you want to ignore these `y` rows.
rows_update(data, y, by = "a", unmatched = "ignore")
#> # A tibble: 3 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
#> 2 2 b 1.5
#> 3 3 z 2.5
rows_patch(data, y, by = "a", unmatched = "ignore")
#> # A tibble: 3 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
#> 2 2 b 1.5
#> 3 3 z 2.5
rows_delete(data, y, by = "a", unmatched = "ignore")
#> Ignoring extra `y` columns: b
#> # A tibble: 2 × 3
#> a b c
#> <int> <chr> <dbl>
#> 1 1 a 0.5
#> 2 2 b 1.5