Schema enforcement

Databricks validates data quality by enforcing schema on write.

Note

This article describes the default behavior for tables on Databricks, which are backed by Delta Lake. Schema enforcement does not apply to tables backed by external data.

Schema enforcement for insert operations

Databricks enforces the following rules when inserting data into a table:

  • All inserted columns must exist in the target table.

  • All column data types must match the column data types in the target table.

Note

Databricks attempts to safely cast column data types to match the target table.

Schema validation during MERGE operations

Databricks enforces the following rules when inserting or updating data as part of a MERGE operation:

  • If the data type in the source statement does not match the target column, MERGE tries to safely cast column data types to match the target table.

  • The columns that are the target of an UPDATE or INSERT action must exist in the target table.

  • When using INSERT * or UPDATE SET * syntax:

    • Columns in the source dataset not present in the target table are ignored.

    • The source dataset must have all the columns present in the target table.

Modify a table schema

You can update the schema of a table using either explicit ALTER TABLE statements or automatic schema evolution. See Update Delta Lake table schema.

Schema evolution has special semantics for MERGE operations. See Automatic schema evolution for Delta Lake merge.