Using Unity Catalog with Structured Streaming

Use Structured Streaming with Unity Catalog to manage data governance for your incremental and streaming workloads on Databricks. This document outlines supported functionality and limitations, and also suggests best practices for using Unity Catalog and Structured Streaming together.

What Structured Streaming functionality does Unity Catalog support?

Unity Catalog does not add any explicit limits for Structured Streaming sources and sinks available on Databricks. The Unity Catalog data governance model allows you to stream data from managed and external tables in Unity Catalog. You can also use external locations managed by Unity Catalog to interact with data using object storage URIs. You can write to external tables using either table names or file paths. You can only interact with managed tables on Unity Catalog using the table name.

Use external locations managed by Unity Catalog when specifying paths for Structured Streaming checkpoints. To learn more about securely connecting storage with Unity Catalog, see Manage external locations and storage credentials.

For both interactive notebooks and scheduled jobs, you must use single user clusters for Structured Streaming on Unity Catalog. Python and Scala are supported.

For an end-to-end demo using Structured Streaming on Unity Catalog, see Tutorial: Run an end-to-end lakehouse analytics pipeline.

What Structured Streaming functionality is disabled on Unity Catalog?

Unity Catalog does not support the following Structured Streaming features:

  • Continuous streaming mode.

  • StreamingQueryListener cannot use credentials or interact with objects managed by Unity Catalog.

  • On Databricks Runtime 11.3 and below, asynchronous checkpointing is not supported.

  • On Databricks Runtime 11.2 and below, using display() with Structured Streaming queries is not supported.