Using Unity Catalog with Structured Streaming

This page shows how to use Structured Streaming with Unity Catalog to manage data governance for your incremental and streaming workloads on Databricks.

What Structured Streaming functionality does Unity Catalog support?

Unity Catalog doesn't add any explicit limits for Structured Streaming sources and sinks available on Databricks.

With Unity Catalog and Structured Streaming you can:

Stream data from both managed and external tables. See Unity Catalog managed tables for Delta Lake and Apache Iceberg.
Use external locations managed by Unity Catalog to interact with data using object storage URIs.
Write to external tables using either table names or file paths. To interact with managed tables, you must use the table name.

For Structured Streaming checkpoints, you must use paths in external locations managed by Unity Catalog. To learn more about securely connecting storage with Unity Catalog, see Connect to cloud object storage using Unity Catalog.

Read a Unity Catalog view as a stream

In Databricks Runtime 14.3 LTS and above, you can use Structured Streaming to read from views registered with Unity Catalog. The underlying tables must use the Delta Lake format. For other limitations, see Limitations.

To read a view with Structured Streaming, use the .table() method with the view's identifier:

Python
df = (spark.readStream
  .table("demoView")
)

Users must have SELECT privileges on the target view.

If you modify the view definition to add or change the tables referenced in the view, you can't use the same streaming checkpoint.

Supported streaming options

The streaming reader applies options to the files and metadata of the underlying Delta Lake tables for the specified view.

The following options are supported:

maxFilesPerTrigger
maxBytesPerTrigger
ignoreDeletes
skipChangeCommits
withEventTimeOrder
startingTimestamp
startingVersion

Reads on views with UNION ALL don't support the withEventTimeOrder and startingVersion options.

If you provide unsupported options, such as readChangeFeed, Spark raises this exception:

Console
AnalysisException: [UNSUPPORTED_STREAMING_OPTIONS_FOR_VIEW.UNSUPPORTED_OPTION] Unsupported for streaming a view. Reason: option <option> is not supported.

Supported streaming operations

Supported operations include:

Operation	Description	Operator	Example
Project	Controls column-level permissions	`SELECT... FROM...`	`CREATE VIEW project_view AS SELECT id, value FROM source_table`
Filter	Controls row-level permissions	`WHERE...`	`CREATE VIEW filter_view AS SELECT * FROM source_table WHERE value > 100`
Union all	Results from multiple tables	`UNION ALL`	`CREATE VIEW union_view AS SELECT id, value FROM source_table1 UNION ALL SELECT * FROM source_table2`

Operation	Description	Operator	Example
Project	Controls column-level permissions	`SELECT... FROM...`	`CREATE VIEW project_view AS SELECT id, value FROM source_table`
Filter	Controls row-level permissions	`WHERE...`	`CREATE VIEW filter_view AS SELECT * FROM source_table WHERE value > 100`
Union all	Results from multiple tables	`UNION ALL`	`CREATE VIEW union_view AS SELECT id, value FROM source_table1 UNION ALL SELECT * FROM source_table2`

Unsupported operations include aggregations, sorting, and table-valued functions such as table_changes(). For detail on table-valued functions, see Table-valued function (TVF) invocation.

If you stream from a view with an unsupported operation, Spark raises this exception:

Console
UnsupportedOperationException: [UNEXPECTED_OPERATOR_IN_STREAMING_VIEW] Unexpected operator <operator> in the CREATE VIEW statement as a streaming source. A streaming view query must consist only of SELECT, WHERE, and UNION ALL operations.

Limitations

Apache Spark continuous processing mode is not supported. See Continuous Processing in the Spark Structured Streaming Programming Guide.
For a list of Structured Streaming features that are not supported on Unity Catalog based on the compute access mode, see Streaming limitations and Streaming and materialized view requirements on dedicated compute.
Views as a streaming source have additional limitations:
- You can only stream from views that query Delta Lake tables. Other data sources are not supported.
- You must register views with Unity Catalog. See Create a view.
- Streaming reads on views don't support all operations or options. See Supported streaming operations and Supported streaming options.
- If you add a new column to a view before a stream updates the schema of the underlying table, the stream fails with a missing column error. You must add the new column to the underlying table, wait for the stream to process that table version, and then add the column to the view.

What Structured Streaming functionality does Unity Catalog support?​

Read a Unity Catalog view as a stream​

Supported streaming options​

Supported streaming operations​

Limitations​

What Structured Streaming functionality does Unity Catalog support?

Read a Unity Catalog view as a stream

Supported streaming options

Supported streaming operations

Limitations