Delta Live Tables SQL language reference
This article provides details and examples for the Delta Live Tables SQL programming interface. For the complete API specification, see SQL API specification.
For information on the Python API, see the Delta Live Tables Python language reference.
Limitations
The PIVOT
clause is not supported. The pivot
operation in Spark requires eager loading of input data to compute the schema of the output. This capability is not supported in Delta Live Tables.
SQL datasets
Use the CREATE LIVE VIEW
or CREATE OR REFRESH LIVE TABLE
syntax to create a view or table with SQL. You can create a dataset by reading from an external data source or from datasets defined in a pipeline. To read from an internal dataset, prepend the LIVE
keyword to the dataset name. The following example defines two different datasets: a table called taxi_raw
that takes a JSON file as the input source and a table called filtered_data
that takes the taxi_raw
table as input:
CREATE OR REFRESH LIVE TABLE taxi_raw
AS SELECT * FROM json.`/databricks-datasets/nyctaxi/sample/json/`
CREATE OR REFRESH LIVE TABLE filtered_data
AS SELECT
...
FROM LIVE.taxi_raw
Delta Live Tables automatically captures the dependencies between datasets defined in your pipeline and uses this dependency information to determine the execution order when performing an update and to record lineage information in the event log for a pipeline.
Both views and tables have the following optional properties:
COMMENT
: A human-readable description of this dataset.Data quality constraints enforced with expectations.
Tables also offer additional control of their materialization:
Specify how tables are partitioned using
PARTITIONED BY
. You can use partitioning to speed up queries.You can set table properties using
TBLPROPERTIES
. See Table properties for more detail.Set a storage location using the
LOCATION
setting. By default, table data is stored in the pipeline storage location ifLOCATION
isn’t set.You can use generated columns in your schema definition.
See SQL API specification for more information about table and view properties.
Use SET
to specify a configuration value for a table or view, including Spark configurations. Any table or view you define in a notebook after the SET
statement has access to the defined value. Any Spark configurations specified using the SET
statement are used when executing the Spark query for any table or view following the SET statement. To read a configuration value in a query, use the string interpolation syntax ${}
. The following example sets a Spark configuration value named startDate
and uses that value in a query:
SET startDate='2020-01-01';
CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM src
WHERE date > ${startDate}
To specify multiple configuration values, use a separate SET
statement for each value.
To read data from a streaming source, for example, Auto Loader or an internal data set, define a STREAMING LIVE
table:
CREATE OR REFRESH STREAMING LIVE TABLE customers_bronze
AS SELECT * FROM cloud_files("/databricks-datasets/retail-org/customers/", "csv")
CREATE OR REFRESH STREAMING LIVE TABLE customers_silver
AS SELECT * FROM STREAM(LIVE.customers_bronze)
For more information on streaming data, see Process streaming data with Delta Live Tables.
SQL API specification
Create table
CREATE OR REFRESH [TEMPORARY] { STREAMING LIVE TABLE | LIVE TABLE } table_name
[(
[
col_name1 col_type1 [ GENERATED ALWAYS AS generation_expression1 ] [ COMMENT col_comment1 ],
col_name2 col_type2 [ GENERATED ALWAYS AS generation_expression2 ] [ COMMENT col_comment2 ],
...
]
[
CONSTRAINT expectation_name_1 EXPECT (expectation_expr1) [ON VIOLATION { FAIL UPDATE | DROP ROW }],
CONSTRAINT expectation_name_2 EXPECT (expectation_expr2) [ON VIOLATION { FAIL UPDATE | DROP ROW }],
...
]
)]
[USING DELTA]
[PARTITIONED BY (col_name1, col_name2, ... )]
[LOCATION path]
[COMMENT table_comment]
[TBLPROPERTIES (key1 [ = ] val1, key2 [ = ] val2, ... )]
AS select_statement
Create view
CREATE TEMPORARY [STREAMING] LIVE VIEW view_name
[(
[
col_name1 [ COMMENT col_comment1 ],
col_name2 [ COMMENT col_comment2 ],
...
]
[
CONSTRAINT expectation_name_1 EXPECT (expectation_expr1) [ON VIOLATION { FAIL UPDATE | DROP ROW }],
CONSTRAINT expectation_name_2 EXPECT (expectation_expr2) [ON VIOLATION { FAIL UPDATE | DROP ROW }],
...
]
)]
[COMMENT view_comment]
AS select_statement
SQL properties
CREATE TABLE or VIEW |
---|
TEMPORARY Create a temporary table. No metadata is persisted for this table. |
STREAMING Create a table that reads an input dataset as a stream.
The input dataset must be a streaming data source, for
example,
Auto Loader or
a |
PARTITIONED BY An optional list of one or more columns to use for partitioning the table. |
LOCATION An optional storage location for table data. If not set, the system will default to the pipeline storage location. |
COMMENT An optional description for the table. |
TBLPROPERTIES An optional list of table properties for the table. |
select_statement A Delta Live Tables query that defines the dataset for the table. |
CONSTRAINT clause |
---|
EXPECT expectation_name Define data quality constraint |
ON VIOLATION Optional action to take for failed rows:
|
Table properties
In addition to the table properties supported by Delta Lake, you can set the following table properties.
Table properties |
---|
pipelines.autoOptimize.managed Default: Enables or disables automatic scheduled optimization of this table. |
pipelines.autoOptimize.zOrderCols Default: None An optional string containing a comma-separated list of column names to z-order this table by.
For example, |
pipelines.reset.allowed Default: Controls whether a full-refresh is allowed for this table. |