Streaming on Databricks
You can use Databricks for near real-time data ingestion, processing, machine learning, and AI for streaming data.
Databricks offers numerous optimizations for streaming and incremental processing, including the following:
Delta Live Tables provides declarative syntax for incremental processing. See What is Delta Live Tables?.
Auto Loader simplifies incremental ingestion from cloud object storage. See What is Auto Loader?.
Unity Catalog adds data governance to streaming workloads. See Using Unity Catalog with Structured Streaming.
Delta Lake provides the storage layer for these integrations. See Delta table streaming reads and writes.
For real-time model serving, see Deploy models using Mosaic AI Model Serving.
- Tutorial
Learn the basics of near real-time and incremental processing with Structured Streaming on Databricks.
- Concepts
Learn core concepts for configuring incremental and near real-time workloads with Structured Streaming.
- Stateful streaming
Managing the intermediate state information of stateful Structured Streaming queries can help prevent unexpected latency and production problems.
- Production considerations
This article contains recommendations to configure production incremental processing workloads with Structured Streaming on Databricks to fulfill latency and cost requirements for real-time or batch applications.
- Monitor streams
Learn how to monitor Structured Streaming applications on Databricks.
- Unity Catalog integration
Learn how to leverage Unity Catalog in conjunction with Structured Streaming on Databricks.
- Streaming with Delta
Learn how to use Delta Lake tables as streaming sources and sinks.
- Examples
See examples of using Spark Structured Streaming with Cassandra, Azure Synapse Analytics, Python notebooks, and Scala notebooks in Databricks.
Databricks has specific features for working with semi-structured data fields contained in Avro, protocol buffers, and JSON data payloads. To learn more, see:
Additional resources
Apache Spark provides a Structured Streaming Programming Guide that has more information about Structured Streaming.
For reference information about Structured Streaming, Databricks recommends the following Apache Spark API references: