Real-time mode reference

Supported languages

Real-time mode supports Scala, Java, and Python.

Compute types

Real-time mode supports the following compute types:

Compute type	Supported
Dedicated (formerly: single user)	✓
Standard (formerly: shared)	✓ (only Python)
Lakeflow pipelines on classic	Not supported as Structured Streaming. Supported through pipeline configuration. See Use real-time mode in Lakeflow pipelines.
Lakeflow pipelines on serverless	Not supported as Structured Streaming. Supported through pipeline configuration. See Use real-time mode in Lakeflow pipelines.
Serverless	Not supported

Compute type	Supported
Dedicated (formerly: single user)	✓
Standard (formerly: shared)	✓ (only Python)
Lakeflow pipelines on classic	Not supported as Structured Streaming. Supported through pipeline configuration. See Use real-time mode in Lakeflow pipelines.
Lakeflow pipelines on serverless	Not supported as Structured Streaming. Supported through pipeline configuration. See Use real-time mode in Lakeflow pipelines.
Serverless	Not supported

For latency-sensitive workloads with UDFs, Databricks recommends that you use dedicated access mode. See Table functions.

Execution modes

Real-time mode supports update mode only:

Execution mode	Supported
Update mode	✓
Append mode	Not supported
Complete mode	Not supported

Sources and sinks

Real-time mode supports the following sources and sinks:

Source or sink	As source	As sink
Apache Kafka	✓	✓
Event Hubs (using Kafka connector)	✓	✓
Kinesis	✓ (only EFO mode)	Not supported
AWS MSK	✓	Not supported
Delta	Not supported	Not supported
Google Pub/Sub	Not supported	Not supported
Apache Pulsar	Not supported	Not supported
Arbitrary sinks (using `forEachWriter`)	Not applicable	✓

Source or sink	As source	As sink
Apache Kafka	✓	✓
Event Hubs (using Kafka connector)	✓	✓
Kinesis	✓ (only EFO mode)	Not supported
AWS MSK	✓	Not supported
Delta	Not supported	Not supported
Google Pub/Sub	Not supported	Not supported
Apache Pulsar	Not supported	Not supported
Arbitrary sinks (using `forEachWriter`)	Not applicable	✓

Operators

Real-time mode supports most Structured Streaming operators:

Stateless operations

Operator	Supported
Selection	✓
Projection	✓
`mapPartitions`	Not supported (see limitation)
Union	✓ (with some limitations)

UDFs

Operator	Supported
Scala UDF	✓ (with some limitations)
Python UDF	✓ (with some limitations)

Aggregation

Function	Supported
sum	✓
count	✓
max	✓
min	✓
avg	✓
Aggregation functions	✓

Windowing

Operator	Supported
Tumbling	✓
Sliding	✓
Session	Not supported

Deduplication

Operator	Supported
dropDuplicates	✓
dropDuplicatesWithinWatermark	✓

Stream to table join

Operator	Supported
Inner join	✓
Outer Join	✓
Broadcast table join (table size of 10mb or less)	✓
Table join (without broadcast)	Not supported

Stream to stream join

Operator	Supported
Inner join	✓ (Databricks Runtime 18 and above, with some configurations)
Outer join	Not supported

note

To use stream to stream joins in real-time mode, you must set additional Spark configurations. For more information on the configurations and for the requirements to run multiple streams, see Stream to stream joins.

Arbitrary stateful operator

Operator	Supported
(flat)MapGroupsWithState	Not supported
transformWithState	✓ (with some differences)

User defined sinks

Sink	Supported
forEach	✓
forEachBatch	Not supported

Special considerations

Some operators and features have specific considerations or differences when used in real-time mode.

`transformWithState` in real-time mode

For building custom stateful applications, Databricks supports transformWithState, an API in Apache Spark Structured Streaming. See Build a custom stateful application for more information about the API and code snippets.

However, the API behaves differently in real-time mode than in micro-batch queries.

Real-time mode calls the handleInputRows(key: String, inputRows: Iterator[T], timerValues: TimerValues) method for each row.
- The inputRows iterator returns a single value. Micro-batch mode calls it once for each key, and the inputRows iterator returns all values for a key in the micro batch.
- Account for this difference when writing your code
Event time timers are not supported in real-time mode.
transformWithStateInPandas is not supported in real-time mode. Use the row-based transformWithState API instead, which uses Row objects rather than pandas DataFrames.
In real-time mode, timers are delayed in firing depending on data arrival:
- If a timer is scheduled for 10:00:00 but no data arrives, the timer doesn't fire immediately.
- If data arrives at 10:00:10, the timer fires with a 10-second delay.
- If no data arrives and the long-running batch is terminating, the timer fires before the batch terminates.

note

In Databricks Runtime 18.1 and below, if you use transformWithState and real-time mode for Python with low throughput, less than 5 records per second, you might see increased latencies of up to a few hundred milliseconds. Databricks recommends upgrading to Databricks Runtime 18.2 and above to resolve.

Python UDFs in real-time mode

Databricks supports the majority of Python user-defined functions (UDFs) in real-time mode:

Stateless

UDF type	Supported
Python scalar UDF (Python scalar user-defined functions (UDFs))	✓
Arrow scalar UDF	✓
Pandas scalar UDF (pandas user-defined functions)	✓
Arrow function (`mapInArrow`)	✓
Pandas function (Map)	✓

UDF type	Supported
Python scalar UDF (Python scalar user-defined functions (UDFs))	✓
Arrow scalar UDF	✓
Pandas scalar UDF (pandas user-defined functions)	✓
Arrow function (`mapInArrow`)	✓
Pandas function (Map)	✓

Stateful grouping (UDAF)

UDF type	Supported
`transformWithState` (only `Row` interface)	✓
`transformWithStateInPandas`	Not supported. Use the row-based `transformWithState` API instead, which uses `Row` objects rather than pandas DataFrames. See `transformWithStateInPandas` not supported for details.
`applyInPandasWithState`	Not supported

UDF type	Supported
`transformWithState` (only `Row` interface)	✓
`transformWithStateInPandas`	Not supported. Use the row-based `transformWithState` API instead, which uses `Row` objects rather than pandas DataFrames. See `transformWithStateInPandas` not supported for details.
`applyInPandasWithState`	Not supported

Non-stateful grouping (UDAF)

UDF type	Supported
`apply`	Not supported
`applyInArrow`	Not supported
`applyInPandas`	Not supported

Table functions

UDF type	Supported
UDTF (Python user-defined table functions (UDTFs))	Not supported
UC UDF	Not supported

There are several points to consider when using Python UDFs in real-time mode:

To minimize latency, set the Arrow batch size (spark.sql.execution.arrow.maxRecordsPerBatch) to 1.
- Trade-off: This configuration optimizes for latency at the expense of throughput. For most workloads, this setting is recommended.
- Increase the batch size only if a higher throughput is required to accommodate input volume, accepting the potential increase in latency.
Pandas UDFs and functions do not perform well with an Arrow batch size of 1.
- If you use pandas UDFs or functions, set the Arrow batch size to a higher value (for example, 100 or higher).
- This implies higher latency. Databricks recommends using an Arrow UDF or function if possible.
transformWithStateInPandas is not supported in real-time mode. Use the row-based transformWithState API instead, which uses Row objects rather than pandas DataFrames. See transformWithStateInPandas not supported and Real-time mode examples for a working Python example using the row-based API.
For latency-sensitive workloads with UDFs, Databricks recommends that you use dedicated access mode. In standard access mode, security isolation overhead might slow UDF performance.

Supported languages​

Compute types​

Execution modes​

Sources and sinks​

Operators​

Stateless operations​

UDFs​

Aggregation​

Windowing​

Deduplication​

Stream to table join​

Stream to stream join​

Arbitrary stateful operator​

User defined sinks​

Special considerations​

transformWithState in real-time mode​

Python UDFs in real-time mode​

Stateless​

Stateful grouping (UDAF)​

Non-stateful grouping (UDAF)​

Table functions​