Delta Live Tables SQL language reference

This article provides details and examples for the Delta Live Tables SQL programming interface. For the complete API specification, see SQL API specification.

For information on the Python API, see the Delta Live Tables Python language reference.

SQL datasets

Use the CREATE LIVE VIEW or CREATE OR REFRESH LIVE TABLE syntax to create a view or table with SQL. You can create a dataset by reading from an external data source or from datasets defined in a pipeline. To read from an internal dataset, prepend the LIVE keyword to the dataset name. The following example defines two different datasets: a table called taxi_raw that takes a JSON file as the input source and a table called filtered_data that takes the taxi_raw table as input:

CREATE OR REFRESH LIVE TABLE taxi_raw
AS SELECT * FROM json.`/databricks-datasets/nyctaxi/sample/json/`

CREATE OR REFRESH LIVE TABLE filtered_data
AS SELECT
  ...
FROM LIVE.taxi_raw

Delta Live Tables automatically captures the dependencies between datasets defined in your pipeline and uses this dependency information to determine the execution order when performing an update and to record lineage information in the event log for a pipeline.

Both views and tables have the following optional properties:

  • COMMENT: A human-readable description of this dataset.

  • Data quality constraints enforced with expectations.

Tables also offer additional control of their materialization:

  • Specify how tables are partitioned using PARTITIONED BY. You can use partitioning to speed up queries.

  • You can set table properties using TBLPROPERTIES. See Table properties for more detail.

  • Set a storage location using the LOCATION setting. By default, table data is stored in the pipeline storage location if LOCATION isn’t set.

See SQL API specification for more information about table and view properties.

Use SET to specify a configuration value for a table or view, including Spark configurations. Any table or view you define in a notebook after the SET statement has access to the defined value. Any Spark configurations specified using the SET statement are used when executing the Spark query for any table or view following the SET statement. To read a configuration value in a query, use the string interpolation syntax ${}. The following example sets a Spark configuration value named startDate and uses that value in a query:

SET startDate='2020-01-01';

CREATE OR REFRESH LIVE TABLE filtered
AS SELECT * FROM src
WHERE date > ${startDate}

To specify multiple configuration values, use a separate SET statement for each value.

To read data from a streaming source, for example, Auto Loader or an internal data set, define a STREAMING LIVE table:

CREATE OR REFRESH STREAMING LIVE TABLE customers_bronze
AS SELECT * FROM cloud_files("/databricks-datasets/retail-org/customers/", "csv")

CREATE OR REFRESH STREAMING LIVE TABLE customers_silver
AS SELECT * FROM STREAM(LIVE.customers_bronze)

For more information on streaming data, see Streaming data processing.

SQL API specification

Note

The Delta Live Tables SQL interface has the following limitations:

  • Identity and generated columns are not supported.

  • The PIVOT clause is not supported. Using a PIVOT clause in a dataset definition results in non-deterministic pipeline latencies.

Create table

CREATE OR REFRESH [TEMPORARY] { STREAMING LIVE TABLE | LIVE TABLE } table_name
  [(
    [
    col_name1 col_type1 [ COMMENT col_comment1 ],
    col_name2 col_type2 [ COMMENT col_comment2 ],
    ...
    ]
    [
    CONSTRAINT expectation_name_1 EXPECT (expectation_expr1) [ON VIOLATION { FAIL UPDATE | DROP ROW }],
    CONSTRAINT expectation_name_2 EXPECT (expectation_expr2) [ON VIOLATION { FAIL UPDATE | DROP ROW }],
    ...
    ]
  )]
  [USING DELTA]
  [PARTITIONED BY (col_name1, col_name2, ... )]
  [LOCATION path]
  [COMMENT table_comment]
  [TBLPROPERTIES (key1 [ = ] val1, key2 [ = ] val2, ... )]
  AS select_statement

Create view

CREATE TEMPORARY [STREAMING] LIVE VIEW view_name
  [(
    [
    col_name1 [ COMMENT col_comment1 ],
    col_name2 [ COMMENT col_comment2 ],
    ...
    ]
    [
    CONSTRAINT expectation_name_1 EXPECT (expectation_expr1) [ON VIOLATION { FAIL UPDATE | DROP ROW }],
    CONSTRAINT expectation_name_2 EXPECT (expectation_expr2) [ON VIOLATION { FAIL UPDATE | DROP ROW }],
    ...
    ]
  )]
  [COMMENT view_comment]
  AS select_statement

SQL properties

CREATE TABLE or VIEW

TEMPORARY

Create a temporary table. No metadata is persisted for this table.

STREAMING

Create a table that reads an input dataset as a stream. The input dataset must be a streaming data source, for example, Auto Loader or a STREAMING LIVE table.

PARTITIONED BY

An optional list of one or more columns to use for partitioning the table.

LOCATION

An optional storage location for table data. If not set, the system will default to the pipeline storage location.

COMMENT

An optional description for the table.

TBLPROPERTIES

An optional list of table properties for the table.

select_statement

A Delta Live Tables query that defines the dataset for the table.

CONSTRAINT clause

EXPECT expectation_name

Define data quality constraint expectation_name. If ON VIOLATION constraint is not defined, add rows that violate the constraint to the target dataset.

ON VIOLATION

Optional action to take for failed rows:

  • FAIL UPDATE: Immediately stop pipeline execution.

  • DROP ROW: Drop the record and continue processing.

Table properties

In addition to the table properties supported by Delta Lake, you can set the following table properties.

Table properties

pipelines.autoOptimize.managed

Default: true

Enables or disables automatic scheduled optimization of this table.

pipelines.autoOptimize.zOrderCols

Default: None

An optional comma-separated list of column names to z-order this table by.

pipelines.reset.allowed

Default: true

Controls whether a full-refresh is allowed for this table.