MERGE INTO (Delta Lake on Databricks)

Merges a set of updates, insertions, and deletions based on a source table into a target Delta table.

Syntax

MERGE INTO target_table_identifier [AS target_alias]
USING source_table_identifier [<time_travel_version>] [AS source_alias]
ON <merge_condition>
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN MATCHED [ AND <condition> ] THEN <matched_action> ]
[ WHEN NOT MATCHED [ AND <condition> ]  THEN <not_matched_action> ]

where

  • table_identifier

    • [database_name.] table_name: A table name, optionally qualified with a database name.
    • delta.`<path-to-table>` : The location of an existing Delta table.
  • AS alias

    Define a table alias.

<merge_condition> =
  How the rows from one relation are combined with the rows of another relation. An expression with a return type of Boolean.

<matched_action>  =
  DELETE  |
  UPDATE SET *  |
  UPDATE SET column1 = value1 [, column2 = value2 ...]

<not_matched_action>  =
  INSERT *  |
  INSERT (column1 [, column2 ...]) VALUES (value1 [, value2 ...])

<time_travel_version>  =
  TIMESTAMP AS OF timestamp_expression |
  VERSION AS OF version
  • In Databricks Runtime 7.3 LTS and above:

    • There can be any number of WHEN MATCHED and WHEN NOT MATCHED clauses.
    • Multiple matches are allowed when matches are unconditionally deleted (since unconditional delete is not ambiguous even if there are multiple matches).
  • WHEN MATCHED clauses are executed when a source row matches a target table row based on the match condition. These clauses have the following semantics.

    • WHEN MATCHED clauses can have at most on UPDATE and one DELETE action. The UPDATE action in merge only updates the specified columns of the matched target row. The DELETE action will delete the matched row.
    • Each WHEN MATCHED clause can have an optional condition. If this clause condition exists, the UPDATE or DELETE action is executed for any matching source-target row pair row only when when the clause condition is true.
    • If there are multiple WHEN MATCHED clauses, then they are evaluated in order they are specified (that is, the order of the clauses matter). All WHEN MATCHED clauses, except the last one, must have conditions.
    • If both WHEN MATCHED clauses have conditions and neither of the conditions are true for a matching source-target row pair, then the matched target row is left unchanged.
    • To update all the columns of the target Delta table with the corresponding columns of the source dataset, use UPDATE SET *. This is equivalent to UPDATE SET col1 = source.col1 [, col2 = source.col2 ...] for all the columns of the target Delta table. Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.
  • WHEN NOT MATCHED clauses are executed when a source row does not match any target row based on the match condition. These clauses have the following semantics.

    • WHEN NOT MATCHED clauses can only have the INSERT action. The new row is generated based on the specified column and corresponding expressions. All the columns in the target table do not need to be specified. For unspecified target columns, NULL is inserted.

    • Each WHEN NOT MATCHED clause can have an optional condition. If the clause condition is present, a source row is inserted only if that condition is true for that row. Otherwise, the source column is ignored.

    • If there are multiple WHEN NOT MATCHED clauses, then they are evaluated in order they are specified (that is, the order of the clauses matter). All WHEN NOT  MATCHED clauses, except the last one, must have conditions.

    • To insert all the columns of the target Delta table with the corresponding columns of the source dataset, use INSERT *. This is equivalent to INSERT (col1 [, col2 ...]) VALUES (source.col1 [, source.col2 ...]) for all the columns of the target Delta table. Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.

      Note

      This behavior changes when automatic schema migration is enabled. See Automatic schema evolution for details.

Important

A MERGE operation can fail if multiple rows of the source dataset match and attempt to update the same rows of the target Delta table. According to the SQL semantics of merge, such an update operation is ambiguous as it is unclear which source row should be used to update the matched target row. You can preprocess the source table to eliminate the possibility of multiple matches. See the Change data capture example—it preprocesses the change dataset (that is, the source dataset) to retain only the latest change for each key before applying that change into the target Delta table.

Examples

You can use MERGE INTO for complex operations like deduplicating data, upserting change data, applying SCD Type 2 operations, etc. See Merge examples for a few examples.