Streaming on Databricks

You can use Databricks for near real-time data ingestion, processing, machine learning, and AI for streaming data.

Databricks offers numerous optimizations for streaming and incremental processing, including the following:

Delta Lake provides the storage layer for these integrations. See Delta table streaming reads and writes.

For real-time model serving, see Model serving with Databricks.

  • Tutorial

    Learn the basics of near real-time and incremental processing with Structured Streaming on Databricks.

  • Concepts

    Learn core concepts for configuring incremental and near real-time workloads with Structured Streaming.

  • Stateful streaming

    Managing the intermediate state information of stateful Structured Streaming queries can help prevent unexpected latency and production problems.

  • Production considerations

    This article contains recommendations to configure production incremental processing workloads with Structured Streaming on Databricks to fulfill latency and cost requirements for real-time or batch applications.

  • Monitor streams

    Learn how to monitor Structured Streaming applications on Databricks.

  • Unity Catalog integration

    Learn how to leverage Unity Catalog in conjunction with Structured Streaming on Databricks.

  • Streaming with Delta

    Learn how to use Delta Lake tables as streaming sources and sinks.

  • Examples

    See examples of using Spark Structured Streaming with Cassandra, Azure Synapse Analytics, Python notebooks, and Scala notebooks in Databricks.

Databricks has specific features for working with semi-structured data fields contained in Avro, protocol buffers, and JSON data payloads. To learn more, see:

Additional resources

Apache Spark provides a Structured Streaming Programming Guide that has more information about Structured Streaming.

For reference information about Structured Streaming, Databricks recommends the following Apache Spark API references: