MERGE INTO
Applies to: Databricks SQL Databricks Runtime
Merges a set of updates, insertions, and deletions based on a source table into a target Delta table.
This statement is supported only for Delta Lake tables.
This page contains details for using the correct syntax with the MERGE
command. See Upsert into a Delta Lake table using merge for more guidance on how to use MERGE
operations to manage your data.
Syntax
MERGE [ WITH SCHEMA EVOLUTION ] INTO target_table_name [target_alias]
USING source_table_reference [source_alias]
ON merge_condition
{ WHEN MATCHED [ AND matched_condition ] THEN matched_action |
WHEN NOT MATCHED [BY TARGET] [ AND not_matched_condition ] THEN not_matched_action |
WHEN NOT MATCHED BY SOURCE [ AND not_matched_by_source_condition ] THEN not_matched_by_source_action } [...]
matched_action
{ DELETE |
UPDATE SET * |
UPDATE SET { column = { expr | DEFAULT } } [, ...] }
not_matched_action
{ INSERT * |
INSERT (column1 [, ...] ) VALUES ( expr | DEFAULT ] [, ...] )
not_matched_by_source_action
{ DELETE |
UPDATE SET { column = { expr | DEFAULT } } [, ...] }
Parameters
WITH SCHEMA EVOLUTION
Applies to: Databricks Runtime 15.2 and above
Enables automatic schema evolution for this
MERGE
operation. When enabled, the schema of the target Delta table is automatically updated to match the schema of the source table.-
A Table name identifying the table being modified. The table referenced must be a Delta table.
The table must not be a foreign table.
-
A Table alias for the target table. The alias must not include a column list.
-
A Table name identifying the source table to be merged into the target table.
-
A Table alias for the source table. The alias must not include a column list.
-
How the rows from one relation are combined with the rows of another relation. An expression with a return type of BOOLEAN.
WHEN MATCHED [ AND
matched_condition]
WHEN MATCHED
clauses are executed when a source row matches a target table row based on themerge_condition
and the optionalmatch_condition
.matched_action
DELETE
Deletes the matching target table row.
Multiple matches are allowed when matches are unconditionally deleted. An unconditional delete is not ambiguous, even if there are multiple matches.
UPDATE
Updates the matched target table row.
To update all the columns of the target Delta table with the corresponding columns of the source dataset, use
UPDATE SET *
. This is equivalent toUPDATE SET col1 = source.col1 [, col2 = source.col2 ...]
for all the columns of the target Delta table. Therefore, this action assumes that the source table has the same columns as those in the target table, otherwise the query will throw an analysis error.Note
This behavior changes when automatic schema evolution is enabled. See Automatic schema evolution for Delta Lake merge for details.
Applies to: Databricks SQL Databricks Runtime 11.3 LTS and above
You can specify
DEFAULT
asexpr
to explicitly update the column to its default value.
If there are multiple
WHEN MATCHED
clauses, then they are evaluated in the order they are specified. EachWHEN MATCHED
clause, except the last one, must have amatched_condition
. Otherwise, the query returns a NON_LAST_MATCHED_CLAUSE_OMIT_CONDITION error.If none of the
WHEN MATCHED
conditions evaluate to true for a source and target row pair that matches themerge_condition
, then the target row is left unchanged.WHEN NOT MATCHED [BY TARGET] [ AND
not_matched_condition]
WHEN NOT MATCHED
clauses insert a row when a source row does not match any target row based on themerge_condition
and the optionalnot_matched_condition
.Applies to: Databricks SQL Databricks Runtime 12.2 LTS and above
WHEN NOT MATCHED BY TARGET
can be used as an alias forWHEN NOT MATCHED
.not_matched_condition
must be a Boolean expression.INSERT *
Inserts all the columns of the target Delta table with the corresponding columns of the source dataset. This is equivalent to
INSERT (col1 [, col2 ...]) VALUES (source.col1 [, source.col2 ...])
for all the columns of the target Delta table. This action requires that the source table has the same columns as those in the target table.Note
This behavior changes when automatic schema evolution is enabled. See Automatic schema evolution for Delta Lake merge for details.
INSERT ( ... ) VALUES ( ... )
The new row is generated based on the specified column and corresponding expressions. All the columns in the target table do not need to be specified. For unspecified target columns, the column default is inserted, or
NULL
if none exists.Applies to: Databricks SQL Databricks Runtime 11.3 LTS and above
You can specify
DEFAULT
as an expression to explicitly insert the column default for a target column.
If there are multiple
WHEN NOT MATCHED
clauses, then they are evaluated in the order they are specified. AllWHEN NOT MATCHED
clauses, except the last one, must havenot_matched_condition
s. Otherwise, the query returns a NON_LAST_NOT_MATCHED_CLAUSE_OMIT_CONDITION error.WHEN NOT MATCHED BY SOURCE [ AND
not_matched_by_source_condition]
Applies to: Databricks SQL Databricks Runtime 12.2 LTS and above
WHEN NOT MATCHED BY SOURCE
clauses are executed when a target row does not match any rows in the source table based on themerge_condition
and the optionalnot_match_by_source_condition
evaluates to true.not_matched_by_source_condition
must be a Boolean expression that only references columns from the target table.not_matched_by_source_action
DELETE
Deletes the target table row.
UPDATE
Updates the target table row.
expr
may only reference columns from the target table, otherwise the query will throw an analysis error.Applies to: Databricks SQL Databricks Runtime 11.3 LTS and above
You can specify
DEFAULT
asexpr
to explicitly update the column to its default value.
Important
Adding a
WHEN NOT MATCHED BY SOURCE
clause to update or delete target rows when themerge_condition
evaluates to false can lead to a large number of target rows being modified. For best performance, applynot_matched_by_source_condition
s to limit the number of target rows updated or deleted.If there are multiple
WHEN NOT MATCHED BY SOURCE clauses
, then they are evaluated in the order they are specified. EachWHEN NOT MATCHED BY SOURCE
clause, except the last one, must have anot_matched_by_source_condition
. Otherwise, the query returns a NON_LAST_NOT_MATCHED_BY_SOURCE_CLAUSE_OMIT_CONDITION error.If none of the
WHEN NOT MATCHED BY SOURCE
conditions evaluate to true for a target row that doesn’t match any rows in the source table based on themerge_condition
, then the target row is left unchanged.
Important
MERGE
operations fail with a DELTA_MULTIPLE_SOURCE_ROW_MATCHING_TARGET_ROW_IN_MERGE error if more than one row in the source table matches the same row in the target table based on the conditions specified in the ON
and WHEN MATCHED
clauses. According to the SQL semantics of merge, this type of update operation is ambiguous because it is unclear which source row should be used to update the matched target row. You can preprocess the source table to eliminate the possibility of multiple matches. See the change data capture example. This example preprocesses the change dataset (the source dataset) to retain only the latest change for each key before applying that change into the target Delta table. In Databricks Runtime 15.4 LTS and below, MERGE
only considers conditions in the ON
clause before evaluating multiple matches.
Examples
You can use MERGE INTO
for complex operations like deduplicating data, upserting change data, applying SCD Type 2 operations, etc. See Upsert into a Delta Lake table using merge for a few examples.
WHEN MATCHED
-- Delete all target rows that have a match in the source table.
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED THEN DELETE
-- Conditionally update target rows that have a match in the source table using the source value.
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED AND target.updated_at < source.updated_at THEN UPDATE SET *
-- Multiple MATCHED clauses conditionally deleting matched target rows and updating two columns for all other matched rows.
> MERGE INTO target USING source
ON target.key = source.key
WHEN MATCHED AND target.marked_for_deletion THEN DELETE
WHEN MATCHED THEN UPDATE SET target.updated_at = source.updated_at, target.value = DEFAULT
WHEN NOT MATCHED [BY TARGET]
-- Insert all rows from the source that are not already in the target table.
> MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED THEN INSERT *
-- Conditionally insert new rows in the target table using unmatched rows from the source table.
> MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED BY TARGET AND source.created_at > now() - INTERVAL “1” DAY THEN INSERT (created_at, value) VALUES (source.created_at, DEFAULT)
WHEN NOT MATCHED BY SOURCE
-- Delete all target rows that have no matches in the source table.
> MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED BY SOURCE THEN DELETE
-- Multiple NOT MATCHED BY SOURCE clauses conditionally deleting unmatched target rows and updating two columns for all other matched rows.
> MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED BY SOURCE AND target.marked_for_deletion THEN DELETE
WHEN NOT MATCHED BY SOURCE THEN UPDATE SET target.value = DEFAULT