Skip to main content

Schema enforcement

Databricks validates data quality by enforcing schema on write for Delta Lake tables. Schema enforcement doesn't apply to tables using non-Delta formats, such as CSV or JSON files in cloud storage.

Schema enforcement for INSERT operations

Databricks enforces the following rules when inserting data into a table:

  • All inserted columns must exist in the target table.
  • All column data types must match the column data types in the target table.
note

Databricks attempts to safely cast column data types to match the target table.

INSERT Examples

For example, inserting a row with a column that doesn't exist in the target table fails:

SQL
-- Fails: unknown_column does not exist in target_table
INSERT INTO catalog.schema.target_table (id, unknown_column) VALUES (1, 'value');

Inserting with a compatible type cast succeeds:

SQL
-- Succeeds: integer 42 is safely cast to BIGINT
INSERT INTO catalog.schema.target_table (id, bigint_column) VALUES (1, 42);

Schema enforcement for MERGE operations

Databricks enforces the following rules when inserting or updating data as part of a MERGE operation:

  • If the data type in the source statement does not match the target column, MERGE tries to safely cast column data types to match the target table.
  • The target columns of an UPDATE or INSERT action must exist in the target table.
  • When using INSERT * or UPDATE SET *:
    • The source dataset must have all the columns present in the target table.
    • Enforcement ignores columns in the source dataset that aren't present in the target table.

MERGE Examples

For example, the following MERGE fails because the query tries to insert a value into unknown_column, which doesn't exist in target_table:

SQL
MERGE INTO catalog.schema.target_table AS t
USING catalog.schema.source_table AS s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.unknown_column = s.value
WHEN NOT MATCHED THEN INSERT (id, unknown_column) VALUES (s.id, s.value);

In another example, assume target_table has columns id and name, and source_table has id, name, and extra_col. The following MERGE using INSERT * ignores extra_col in the source and succeeds because all target columns are present in the source:

SQL
MERGE INTO catalog.schema.target_table AS t
USING catalog.schema.source_table AS s
ON t.id = s.id
WHEN NOT MATCHED THEN INSERT *;

Modify a table schema

You can update the schema of a table using explicit ALTER TABLE statements or automatic schema evolution. See Update table schemas with schema evolution.

For example, to add a column explicitly:

SQL
ALTER TABLE catalog.schema.table_name ADD COLUMN new_column STRING;

To enable automatic schema evolution for a write operation, set the mergeSchema option:

SQL
SET spark.databricks.delta.schema.autoMerge.enabled = true;
INSERT INTO catalog.schema.table_name SELECT * FROM source_table;

Schema evolution has special semantics for INSERT and MERGE operations. See Enable schema evolution.

External tables

If you modify an external table's metadata directly with external clients outside Databricks or by using path-based access, Unity Catalog doesn't automatically sync updates to the schema. This might prevent schema enforcement from applying correctly.

Run MSCK REPAIR TABLE <table-name> SYNC METADATA to sync the schema with Unity Catalog. See REPAIR TABLE.