Upsert and Update Data Frames — UpdateInsert • c2z

Combines two data frames by updating rows in x with values from y based on a common key, and inserting new rows from y that are not present in x. The function first harmonizes the column structures of both data frames by adding missing columns and coercing types as necessary.

Usage

UpdateInsert(x, y, key = "key", check.missing = FALSE)

Arguments

x: A data frame to be updated.
y: A data frame containing new values to update x. Must include the column specified by key.
key: A character string specifying the unique key column used for matching rows. Defaults to "key".
check.missing: Logical; if TRUE, performs a cell-by-cell update only when the new value is not missing. Missing values are defined as NA for atomic types or an empty list for list columns. If FALSE, a standard upsert is performed using dplyr::rows_upsert. Defaults to FALSE.

Value

A data frame resulting from updating x with values from y.

Details

The function works in several steps:

It computes the union of all column names from x and y and adds any missing columns to both data frames using the internal helper function AddColumns. Missing columns are filled with an appropriate NA value based on their type.
Both x and y are reordered to have the same column order.
For each common column (excluding the key), if x's column is entirely NA or if the data types differ, coercion is performed to ensure compatibility between x and y.
When check.missing is TRUE, the function iterates over each common key and updates each cell in x only if the corresponding cell in y is not missing. Otherwise, it uses dplyr::rows_upsert to perform a standard upsert.
New rows present in y but not in x are appended.

Examples

if (FALSE) { # \dontrun{
  # Example data frames:
  df1 <- data.frame(
    key = 1:3,
    a = c(NA, 2, NA),
    b = c("x", NA, "z"),
    stringsAsFactors = FALSE
  )

  df2 <- data.frame(
    key = c(2, 3, 4),
    a = c(5, 6, 7),
    b = c("y", "w", "v"),
    stringsAsFactors = FALSE
  )

  # Standard upsert (check.missing = FALSE):
  result <- UpdateInsert(df1, df2, key = "key", check.missing = FALSE)

  # Cell-by-cell update (check.missing = TRUE):
  result <- UpdateInsert(df1, df2, key = "key", check.missing = TRUE)
} # }