Use Unity Catalog with your Delta Live Tables pipelines
Preview
Delta Live Tables support for Unity Catalog is in Public Preview.
In addition to the existing support for persisting tables to the Hive metastore, you can use Unity Catalog with your Delta Live Tables pipelines to:
Define a catalog in Unity Catalog where your pipeline will persist tables.
Read data from Unity Catalog tables.
Your workspace can contain pipelines that use Unity Catalog or the Hive metastore. However, a single pipeline cannot write to both the Hive metastore and Unity Catalog and existing pipelines cannot be upgraded to use Unity Catalog. Your existing pipelines that do not use Unity Catalog are not affected by this preview, and will continue to persist data to the Hive metastore using the configured storage location.
Unless specified otherwise in this document, all existing data sources and Delta Live Tables functionality are supported with pipelines that use Unity Catalog. Both the Python and SQL interfaces are supported with pipelines that use Unity Catalog.
The tables created in your pipeline can also be queried from shared Unity Catalog clusters using Databricks Runtime 12.2 and above or a SQL warehouse. Tables cannot be queried from assigned or no isolation clusters.
To manage permissions on the tables created by a Unity Catalog pipeline, use GRANT and REVOKE.
Requirements
The following are required to create tables in Unity Catalog from a Delta Live Tables pipeline:
Your pipeline must be configured to use the preview channel.
You must have
USE CATALOG
privileges on the target catalog.You must have
CREATE MATERIALIZED VIEW
andUSE SCHEMA
privileges in the target schema if your pipeline creates live tables.You must have
CREATE TABLE
andUSE SCHEMA
privileges in the target schema if your pipeline creates streaming live tables.If a target schema is not specified in the pipeline settings, you must have
CREATE MATERIALIZED VIEW
orCREATE TABLE
privileges on at least one schema in the target catalog.
Limitations
The following are limitations when using Unity Catalog with Delta Live Tables:
Existing pipelines that use the Hive metastore cannot be upgraded to use Unity Catalog. To migrate an existing pipeline that writes to Hive metastore, you must create a new pipeline and re-ingest data from the data source(s).
Init scripts, third-party libraries and JARs are not supported.
Running the following data manipulation language (DML) queries from external clients, for example, Databricks SQL, to modify streaming tables written by a pipeline is not supported:
Any DML queries against
APPLY CHANGES
streaming tables.Any DML queries that modify the schema of a streaming table.
INSERT OVERWRITE
andMERGE
.Any DML queries submitted by a user that does not own the streaming table.
A materialized view created in a Delta Live Tables pipeline cannot be used as a streaming source outside of that pipeline, for example, in another pipeline or in a downstream notebook.
You cannot change the owner of a pipeline that uses Unity Catalog.
Publishing to catalogs or schemas that specify a managed storage location is not supported. All tables are stored in the metastore root storage location.
The History tab in Data Explorer does not show history for streaming tables and materialized views.
The
LOCATION
property is not supported when defining a table.Unity Catalog enabled pipelines cannot publish to the Hive metastore.
Python UDF support is in Private Preview. To have this feature enabled, contact your Databricks field engineering representative. When UDF support is enabled, to use Python UDFs in a pipeline you must add the
"PythonUDF.enabled": "true"
custom cluster tag to both the default and maintenance clusters for the pipeline.You cannot use Delta Sharing with a Delta Live Tables materialized view or streaming table published to Unity Catalog.
You cannot use the
event log
table valued function in a pipeline or query to access the event logs of multiple pipelines.
Changes to existing functionality
When DLT is configured to persist data to Unity Catalog, the lifecycle of the table is managed by the Delta Live Tables pipeline. Because the pipeline manages the table lifecycle:
When a table is removed from the Delta Live Tables pipeline definition, the corresponding materialized view or streaming table entry is removed from Unity Catalog on the next pipeline update. The actual data is retained for a period of time so that it can be recovered if it was deleted by mistake. The data can be recovered by adding the materialized view or streaming table back into the pipeline definition.
Deleting the Delta Live Tables pipeline results in deletion of all tables defined in that pipeline. Because of this change, the Delta Live Tables UI is updated to prompt you to confirm deletion of a pipeline.
Write tables to Unity Catalog from a Delta Live Tables pipeline
To write your tables to Unity Catalog, when you create a pipeline, select Unity Catalog under Storage options, select a catalog in the Catalog dropdown menu, and provide a database name in the Target schema field.
Ingest data into a Unity Catalog pipeline
Your pipeline configured to use Unity Catalog can read data from:
Unity Catalog managed and external tables, views, materialized views and streaming tables.
Hive metastore tables and views.
Auto Loader using the
cloud_files()
function to read from Unity Catalog external locations.Apache Kafka and Amazon Kinesis.
The following are examples of reading from Unity Catalog and Hive metastore tables.
Batch ingestion from a Unity Catalog table
CREATE OR REFRESH LIVE TABLE
table_name
AS SELECT
*
FROM
my_catalog.my_schema.table1;
@dlt.table
def table_name():
return spark.table("my_catalog.my_schema.table")
Stream changes from a Unity Catalog table
CREATE OR REFRESH STREAMING TABLE
table_name
AS SELECT
*
FROM
STREAM(my_catalog.my_schema.table1);
@dlt.table
def table_name():
return spark.readStream.table("my_catalog.my_schema.table")
Ingest data from Hive metastore
A pipeline that uses Unity Catalog can read data from Hive metastore tables using the hive_metastore
catalog:
CREATE OR REFRESH LIVE TABLE
table_name
AS SELECT
*
FROM
hive_metastore.some_schema.table;
@dlt.table
def table3():
return spark.table("hive_metastore.some_schema.table")
Ingest data from Auto Loader
CREATE OR REFRESH STREAMING LIVE TABLE
table_name
AS SELECT
*
FROM
cloud_files(
<path_to_uc_external_location>,
"json"
)
@dlt.table(table_properties={"quality": "bronze"})
def table_name():
return (
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "json")
.load(f"{path_to_uc_external_location}")
)
Grant create table or create materialized view privileges
GRANT CREATE { MATERIALIZED VIEW | TABLE } ON SCHEMA
my_catalog.my_schema
TO
{ principal | user }
View lineage for a pipeline
Lineage for tables in a Delta Live Tables pipeline is visible in Data Explorer. For materialized views or streaming tables in a Unity Catalog enabled pipeline, the Data Explorer lineage UI shows the upstream and downstream tables. Lineage is only displayed between tables defined in the pipeline; tables that are defined outside of the pipeline and read in the pipeline are not shown in the Data Explorer lineage UI. To learn more about Unity Catalog lineage, see Capture and view data lineage with Unity Catalog.
For a materialized view or streaming table in a Unity Catalog enabled Delta Live Tables pipeline, the Data Explorer lineage UI will also link to the pipeline that produced the materialized view or streaming table if the pipeline is accessible from the current workspace.