Get started with real-time mode

Preview

Real-time mode enables ultra-low latency streaming with end-to-end latency as low as five milliseconds, making it ideal for operational workloads like fraud detection and real-time personalization. This tutorial guides you through setting up your first real-time streaming query using a simple example.

For conceptual information about real-time mode, when to use it, and supported features, see Real-time mode in Structured Streaming.

Requirements

You have permission to create classic compute.
Databricks Runtime 17.1 or above (required for using the display function with real-time mode).

note

If you don't have classic compute creation privileges, contact your workspace administrator to create a real-time mode cluster for you using the configuration in Step 1.

Step 1: Create classic compute for real-time mode

Real-time mode requires a specific classic compute configuration to achieve ultra-low latency. These settings ensure that tasks run simultaneously across all stages and data is processed continuously as it arrives, rather than in batches.

To create a properly configured classic compute:

In your Databricks workspace, click Compute in the sidebar.
Click Create compute.
Enter a name.
Select Databricks Runtime 17.1 or above.
Clear Photon acceleration (real-time mode doesn't support Photon).
Clear Enable autoscaling (real-time mode requires a fixed cluster size).
Under Advanced performance, clear Use spot instances (spot instances can cause interruptions).
Click Advanced options to expand additional settings.
Under Access mode, select Dedicated (formerly: Single user).
Under Spark config, add the following configuration:
Text
```
spark.databricks.streaming.realTimeMode.enabled true
```
Click Create compute.

Step 2: Create a notebook

Notebooks provide an interactive environment for developing and testing streaming queries. You use this notebook to write your real-time query and see the results update continuously.

To create a notebook:

Click New in the sidebar, then click Notebook.
In the compute drop-down menu, select the compute you created in Step 1.
Select Python or Scala as the default language.

Step 3: Run a real-time mode query

Copy and paste the following code into a notebook cell and run it. This example uses a rate source, which generates rows at a specified rate, and displays the results in real time.

note

The display function with realTime trigger is available in Databricks Runtime 17.1 and above.

Python
Scala

Python
inputDF = (
  spark
  .readStream
  .format("rate")
  .option("numPartitions", 2)
  .option("rowsPerSecond", 1)
  .load()
)
display(inputDF, realTime="5 minutes", outputMode="update")

Scala
import org.apache.spark.sql.streaming.Trigger
import org.apache.spark.sql.streaming.OutputMode

val inputDF = spark
  .readStream
  .format("rate")
  .option("numPartitions", 2)
  .option("rowsPerSecond", 1)
  .load()
display(inputDF, trigger=Trigger.RealTime(), outputMode=OutputMode.Update())

After running the code, you see a table that updates in real time as new rows are generated. The table displays a timestamp column and a value column that increments with each row.

Understanding the code

The code above demonstrates the essential components of a real-time streaming query. The following tables explain the key parameters and what they control:

Python
Scala

Parameter	Description
`format("rate")`	Uses the rate source, a built-in source that generates rows at a configurable rate. This is useful for testing without external dependencies.
`numPartitions`	Sets the number of partitions for the generated data.
`rowsPerSecond`	Controls how many rows are generated per second.
`realTime="5 minutes"`	Enables real-time mode. The interval specifies how often the query checkpoints progress. Longer intervals mean less frequent checkpointing but potentially longer recovery times after failures.
`outputMode="update"`	Real-time mode requires update output mode.

Parameter	Description
`format("rate")`	Uses the rate source, a built-in source that generates rows at a configurable rate. This is useful for testing without external dependencies.
`numPartitions`	Sets the number of partitions for the generated data.
`rowsPerSecond`	Controls how many rows are generated per second.
`Trigger.RealTime()`	Enables real-time mode with the default checkpoint interval. You can also specify an interval, for example `Trigger.RealTime("5 minutes")`.
`OutputMode.Update()`	Real-time mode requires update output mode.

What you're seeing

When you run the query, the display function creates a table that updates in real time as the rate source generates new rows. Each row contains:

timestamp: The time when the row was generated by the rate source
value: A monotonically increasing counter that increments with each new row

The table updates continuously with minimal latency, demonstrating how real-time mode processes data as soon as it becomes available. This is the core benefit of real-time mode - the ability to see and act on data immediately rather than waiting for batch processing.

What you've learned

You've successfully set up and run your first real-time streaming query. You now know how to:

Configure classic compute with the required settings for real-time mode (dedicated cluster, Photon disabled, autoscaling disabled, Spark config)
Enable real-time processing using the realTime trigger
Use the display function for interactive development and testing
Verify that your query is running in real-time mode by observing continuous updates

You're ready to build production real-time pipelines with Kafka, Kinesis, and other supported sources. To learn more about Structured Streaming, see Structured Streaming concepts.

Next steps

Now that you've run your first real-time query, explore these resources to build production streaming applications:

Real-time mode examples - Working code examples for Kafka sources and sinks, stateful queries, aggregations, and custom sinks
Real-time mode reference - Learn about cluster sizing, supported operators, monitoring, and feature limitations
Stateful streaming applications - Add state management to your streaming queries for deduplication, aggregations, and windowing
Advanced state management - Use transformWithState for custom stateful processing with time-to-live (TTL) and complex logic

Requirements​

Step 1: Create classic compute for real-time mode​

Step 2: Create a notebook​

Step 3: Run a real-time mode query​

Understanding the code​

What you're seeing​

What you've learned​

Next steps​