Using Unity Catalog with Structured Streaming

Use Structured Streaming with Unity Catalog to manage data governance for your incremental and streaming workloads on Databricks. This document outlines supported functionality and suggests best practices for using Unity Catalog and Structured Streaming together.

What Structured Streaming functionality does Unity Catalog support?

Unity Catalog does not add any explicit limits for Structured Streaming sources and sinks available on Databricks. The Unity Catalog data governance model allows you to stream data from managed and external tables in Unity Catalog. You can also use external locations managed by Unity Catalog to interact with data using object storage URIs. You can write to external tables using either table names or file paths. You must interact with managed tables on Unity Catalog using the table name.

Use external locations managed by Unity Catalog when specifying paths for Structured Streaming checkpoints. To learn more about securely connecting storage with Unity Catalog, see Connect to cloud object storage using Unity Catalog.

Structured streaming feature support differs depending on the Databricks Runtime version you are running and whether you are using assigned or standard compute access mode. For details, see Streaming limitations for Unity Catalog.

What Structured Streaming functionality is not supported on Unity Catalog?

For a list of Structured Streaming features that are not supported on Unity Catalog, see Streaming limitations for Unity Catalog.

Read a Unity Catalog view as a stream

In Databricks Runtime 14.1 and above, you can use Structured Streaming to perform streaming reads from views registered with Unity Catalog. Databricks only supports streaming reads from views defined against Delta tables.

To read a view with Structured Streaming, provide the identifier for the view to the .table() method, as in the following example:

Python
df = (spark.readStream
  .table("demoView")
)

Users must have SELECT privileges on the target view.

Supported options for configuring streaming reads against views

The following options are supported when configuring streaming reads against views:

maxFilesPerTrigger
maxBytesPerTrigger
ignoreDeletes
skipChangeCommits
withEventTimeOrder
startingTimestamp
startingVersion

The streaming reader applies these options to the files and metadata defining the underlying Delta tables.

important

Reads against views defined with UNION ALL do not support the options withEventTimeOrder and startingVersion.

Supported operations in source views

Not all views support streaming reads. Unsupported operations in source views include aggregations and sorting.

The following list provides descriptions and example view definitions for supported operations:

Project
- Description: Controls column-level permissions
- Operator: SELECT... FROM...
- Example statement:
  SQL
```
CREATE VIEW project_view AS
SELECT id, value
FROM source_table
```
Filter
- Description: Controls row-level permissions
- Operator: WHERE...
- Example statement:
  SQL
```
CREATE VIEW filter_view AS
SELECT * FROM source_table
WHERE value > 100
```

Union all

Description: Results from multiple tables
Operator: UNION ALL

Example statement:

SQL
CREATE VIEW union_view AS
SELECT id, value FROM source_table1
UNION ALL
SELECT * FROM source_table2

note

You cannot modify the view definition to add or change the tables referenced in the view and use the same streaming checkpoint.

Limitations

The following limitations apply:

You can only stream from views backed by Delta tables. Views defined against other data sources are not supported.
You must register views with Unity Catalog.

The following exception displays if you stream from a view with an unsupported operator:

UnsupportedOperationException: [UNEXPECTED_OPERATOR_IN_STREAMING_VIEW] Unexpected operator <operator> in the CREATE VIEW statement as a streaming source. A streaming view query must consist only of SELECT, WHERE, and UNION ALL operations.

The following exception displays if you provide unsupported options:

AnalysisException: [UNSUPPORTED_STREAMING_OPTIONS_FOR_VIEW.UNSUPPORTED_OPTION] Unsupported for streaming a view. Reason: option <option> is not supported.

What Structured Streaming functionality does Unity Catalog support?​

What Structured Streaming functionality is not supported on Unity Catalog?​

Read a Unity Catalog view as a stream​

Supported options for configuring streaming reads against views​

Supported operations in source views​

Limitations​