Skip to main content

Get started with real-time mode

Preview

This feature is in Public Preview.

Real-time mode enables ultra-low latency streaming with end-to-end latency as low as five milliseconds, making it ideal for operational workloads like fraud detection and real-time personalization. This tutorial guides you through setting up your first real-time streaming query using a simple example.

For conceptual information about real-time mode, when to use it, and supported features, see Real-time mode in Structured Streaming.

Requirements

note

If you don't have classic compute creation privileges, contact your workspace administrator to create a real-time mode cluster for you using the configuration in Step 1.

Step 1: Create classic compute for real-time mode

Real-time mode requires a specific classic compute configuration to achieve ultra-low latency. These settings ensure that tasks run simultaneously across all stages and data is processed continuously as it arrives, rather than in batches.

To create a properly configured classic compute:

  1. In your Databricks workspace, click Compute in the sidebar.

  2. Click Create compute.

  3. Enter a name.

  4. Select Databricks Runtime 17.1 or above.

  5. Clear Photon acceleration (real-time mode doesn't support Photon).

  6. Clear Enable autoscaling (real-time mode requires a fixed cluster size).

  7. Under Advanced performance, clear Use spot instances (spot instances can cause interruptions).

  8. Click Advanced options to expand additional settings.

  9. Under Access mode, select Dedicated (formerly: Single user).

  10. Under Spark config, add the following configuration:

    Text
    spark.databricks.streaming.realTimeMode.enabled true
  11. Click Create compute.

Step 2: Create a notebook

Notebooks provide an interactive environment for developing and testing streaming queries. You use this notebook to write your real-time query and see the results update continuously.

To create a notebook:

  1. Click New in the sidebar, then click Notebook.
  2. In the compute drop-down menu, select the compute you created in Step 1.
  3. Select Python or Scala as the default language.

Step 3: Run a real-time mode query

Copy and paste the following code into a notebook cell and run it. This example uses a rate source, which generates rows at a specified rate, and displays the results in real time.

note

The display function with realTime trigger is available in Databricks Runtime 17.1 and above.

Python
inputDF = (
spark
.readStream
.format("rate")
.option("numPartitions", 2)
.option("rowsPerSecond", 1)
.load()
)
display(inputDF, realTime="5 minutes", outputMode="update")

After running the code, you see a table that updates in real time as new rows are generated. The table displays a timestamp column and a value column that increments with each row.

Understanding the code

The code above demonstrates the essential components of a real-time streaming query. The following tables explain the key parameters and what they control:

Parameter

Description

format("rate")

Uses the rate source, a built-in source that generates rows at a configurable rate. This is useful for testing without external dependencies.

numPartitions

Sets the number of partitions for the generated data.

rowsPerSecond

Controls how many rows are generated per second.

realTime="5 minutes"

Enables real-time mode. The interval specifies how often the query checkpoints progress. Longer intervals mean less frequent checkpointing but potentially longer recovery times after failures.

outputMode="update"

Real-time mode requires update output mode.

What you're seeing

When you run the query, the display function creates a table that updates in real time as the rate source generates new rows. Each row contains:

  • timestamp: The time when the row was generated by the rate source
  • value: A monotonically increasing counter that increments with each new row

The table updates continuously with minimal latency, demonstrating how real-time mode processes data as soon as it becomes available. This is the core benefit of real-time mode - the ability to see and act on data immediately rather than waiting for batch processing.

What you've learned

You've successfully set up and run your first real-time streaming query. You now know how to:

  • Configure classic compute with the required settings for real-time mode (dedicated cluster, Photon disabled, autoscaling disabled, Spark config)
  • Enable real-time processing using the realTime trigger
  • Use the display function for interactive development and testing
  • Verify that your query is running in real-time mode by observing continuous updates

You're ready to build production real-time pipelines with Kafka, Kinesis, and other supported sources. To learn more about Structured Streaming, see Structured Streaming concepts.

Next steps

Now that you've run your first real-time query, explore these resources to build production streaming applications: