Constraints on Databricks
Databricks supports standard SQL constraint management clauses:
- Enforced constraints verify data integrity before adding rows to a table.
- Informational primary key and foreign key constraints define relationships between fields in tables and aren't enforced.
All constraints on Databricks require Delta Lake.
For a related concept in Lakeflow Spark Declarative Pipelines, see Manage data quality with pipeline expectations.
Enforced constraints on Databricks
When a constraint is violated, the transaction fails with an error. Databricks supports two types of constraints:
NOT NULL: indicates that values in specific columns cannot be null.CHECK: indicates that a specified boolean expression must be true for each input row.
When you add a constraint, Databricks automatically upgrades the table writer protocol version if the previous writer version was less than 3. See Delta Lake feature compatibility and protocols to understand table protocol versioning and what it means to upgrade the protocol version.
Set a NOT NULL constraint in Databricks
When you create a table, specify NOT NULL constraints in the schema. To drop or add NOT NULL constraints, use the ALTER TABLE ... ALTER COLUMN command:
CREATE TABLE people10m (
id INT NOT NULL,
firstName STRING,
middleName STRING NOT NULL,
lastName STRING,
gender STRING,
birthDate TIMESTAMP,
ssn STRING,
salary INT
);
ALTER TABLE people10m ALTER COLUMN middleName DROP NOT NULL;
ALTER TABLE people10m ALTER COLUMN ssn SET NOT NULL;
Databricks verifies that all existing rows satisfy the constraint before adding a NOT NULL constraint to a table.
If you specify a NOT NULL constraint on a column nested within a struct, the parent struct must also be not null. Columns nested within array or map types don't accept NOT NULL constraints.
See CREATE TABLE [USING] and ALTER TABLE ALTER COLUMN.
Set a CHECK constraint in Databricks
Manage CHECK constraints with the ALTER TABLE ADD CONSTRAINT and ALTER TABLE DROP CONSTRAINT commands. ALTER TABLE ADD CONSTRAINT verifies that all existing rows satisfy the constraint before adding the constraint to the table.
The following restrictions apply to check constraints:
- A
CHECKconstraint expression can use any SQL functions in Spark that always return the same result when given the same argument values, except the following types of functions:- User-defined functions.
- Aggregate functions.
- Window functions.
- Functions returning multiple rows.
Add to an existing table
CREATE TABLE people10m (
id INT,
firstName STRING,
middleName STRING,
lastName STRING,
gender STRING,
birthDate TIMESTAMP,
ssn STRING,
salary INT
);
ALTER TABLE people10m ADD CONSTRAINT dateWithinRange CHECK (birthDate > '1900-01-01');
ALTER TABLE people10m DROP CONSTRAINT dateWithinRange;
See ALTER TABLE ADD CONSTRAINT and ALTER TABLE DROP CONSTRAINT.
View check constraint table properties
Use the DESCRIBE DETAIL and SHOW TBLPROPERTIES commands to see a table's CHECK constraints.
ALTER TABLE people10m ADD CONSTRAINT validIds CHECK (id > 1 and id < 99999999);
DESCRIBE DETAIL people10m;
SHOW TBLPROPERTIES people10m;
Remove check constraints
In Databricks Runtime 15.4 LTS and above, use the DROP FEATURE command to remove check constraints from a table and downgrade the table protocol.
See Drop a Delta Lake table feature and downgrade table protocol.
Declare primary key and foreign key relationships
Primary key and foreign key constraints are available for Unity Catalog and Delta Lake tables in Databricks Runtime 13.3 LTS and above, and are GA in Databricks Runtime 15.2 and above.
Primary and foreign keys are informational only and aren't enforced. Foreign keys must reference a primary key in another table. Informational key constraints might improve performance with query optimizations.
Query the information_schema or use DESCRIBE to get details about how constraints are applied across a given catalog.
Add to new tables
Declare primary keys and foreign keys as part of the table specification clause during table creation:
CREATE TABLE T(pk1 INTEGER NOT NULL, pk2 INTEGER NOT NULL,
CONSTRAINT t_pk PRIMARY KEY(pk1, pk2));
CREATE TABLE S(pk INTEGER NOT NULL PRIMARY KEY,
fk1 INTEGER, fk2 INTEGER,
CONSTRAINT s_t_fk FOREIGN KEY(fk1, fk2) REFERENCES T);
CTAS statements don't support this constraint clause.
Add to existing tables
To add constraints to existing tables:
ALTER TABLE T ADD CONSTRAINT t_pk PRIMARY KEY(pk1, pk2);
ALTER TABLE S ADD CONSTRAINT s_t_fk FOREIGN KEY(fk1, fk2) REFERENCES T;