Skip to main content

Using Unity Catalog with Structured Streaming

This page shows how to use Structured Streaming with Unity Catalog to manage data governance for your incremental and streaming workloads on Databricks.

What Structured Streaming functionality does Unity Catalog support?

Unity Catalog doesn't add any explicit limits for Structured Streaming sources and sinks available on Databricks.

With Unity Catalog and Structured Streaming you can:

For Structured Streaming checkpoints, you must use paths in external locations managed by Unity Catalog. To learn more about securely connecting storage with Unity Catalog, see Connect to cloud object storage using Unity Catalog.

Read a Unity Catalog view as a stream

In Databricks Runtime 14.3 LTS and above, you can use Structured Streaming to read from views registered with Unity Catalog. The underlying tables must use the Delta Lake format. For other limitations, see Limitations.

To read a view with Structured Streaming, use the .table() method with the view's identifier:

Python
df = (spark.readStream
.table("demoView")
)

Users must have SELECT privileges on the target view.

If you modify the view definition to add or change the tables referenced in the view, you can't use the same streaming checkpoint.

Supported streaming options

The streaming reader applies options to the files and metadata of the underlying Delta tables for the specified view.

The following options are supported:

  • maxFilesPerTrigger
  • maxBytesPerTrigger
  • ignoreDeletes
  • skipChangeCommits
  • withEventTimeOrder
  • startingTimestamp
  • startingVersion

Reads on views with UNION ALL don't support the withEventTimeOrder and startingVersion options.

If you provide unsupported options, such as readChangeFeed, Spark raises this exception:

Console
AnalysisException: [UNSUPPORTED_STREAMING_OPTIONS_FOR_VIEW.UNSUPPORTED_OPTION] Unsupported for streaming a view. Reason: option <option> is not supported.

Supported streaming operations

Supported operations include:

Operation

Description

Operator

Example

Project

Controls column-level permissions

SELECT... FROM...

CREATE VIEW project_view AS SELECT id, value FROM source_table

Filter

Controls row-level permissions

WHERE...

CREATE VIEW filter_view AS SELECT * FROM source_table WHERE value > 100

Union all

Results from multiple tables

UNION ALL

CREATE VIEW union_view AS SELECT id, value FROM source_table1 UNION ALL SELECT * FROM source_table2

Unsupported operations include aggregations, sorting, and table-valued functions such as table_changes(). For detail on table-valued functions, see Table-valued function (TVF) invocation.

If you stream from a view with an unsupported operation, Spark raises this exception:

Console
UnsupportedOperationException: [UNEXPECTED_OPERATOR_IN_STREAMING_VIEW] Unexpected operator <operator> in the CREATE VIEW statement as a streaming source. A streaming view query must consist only of SELECT, WHERE, and UNION ALL operations.

Limitations