Optimize stateless streaming queries

This page describes optimization features available for stateless streaming queries in Databricks Runtime 18.0 and above.

Stateless Structured Streaming queries process data without maintaining intermediate state. These queries do not use stateful operators such as streaming aggregations, dropDuplicates, or stream-stream joins. Examples include queries that use stream-static joins, MERGE INTO with Delta tables, and other operations that only track which rows have been processed from source to sink.

Adaptive Query Execution and Auto Optimized Shuffle

Databricks supports Adaptive Query Execution (AQE) and Auto Optimized Shuffle (AOS) for stateless streaming queries. These features help optimize streaming workloads that use stream-static joins, MERGE INTO with Delta tables, and similar operations.

To enable AQE for stateless streaming queries, set the following configuration to true. This is enabled by default:

ini
spark.sql.adaptive.streaming.stateless.enabled true

To enable AOS for stateless streaming queries, enable AQE and set the following configuration:

ini
spark.sql.shuffle.partitions auto

Change shuffle partitions during query restart

Stateless streaming queries support changing the number of shuffle partitions when you restart a query. This allows you to adjust parallelism to accommodate varying input volumes.

This feature is especially useful for historical backfill scenarios. For example, you can process historical backfill with higher parallelism and then reduce parallelism for real-time input.

To change the number of shuffle partitions, set the following configuration to your desired value and restart the query:

ini
spark.sql.shuffle.partitions <number>

Adaptive Query Execution and Auto Optimized Shuffle​

Change shuffle partitions during query restart​

Adaptive Query Execution and Auto Optimized Shuffle

Change shuffle partitions during query restart