Sinks in Lakeflow Spark Declarative Pipelines
By default, pipeline flows write results to Delta tables managed by Unity Catalog—typically streaming tables or materialized views. Sinks are an alternative output target that let you write transformed data to destinations outside Databricks-managed storage, such as event streaming services or custom data stores.
Sinks are used with append flows. You define a sink using one of the sink APIs, then reference it as the target in your append_flow definition.
When to use sinks
Databricks recommends using sinks when you need to:
- Build operational use cases with low latency, such as fraud detection, real-time analytics, or customer recommendations, where data must flow to a message bus rather than cloud storage. For workloads that require millisecond latency, see Use real-time mode in Lakeflow Spark Declarative Pipelines.
- Write transformed data to tables managed by an external Delta instance, including Unity Catalog managed and external tables.
- Perform reverse ETL into external systems, such as writing processed data back to Apache Kafka topics for consumption outside Databricks.
- Write to a format not natively supported by Databricks, using Python custom data sources.
Sink types
Pipelines support the following sink types:
Sink type | Description |
|---|---|
Delta table sinks | Write to Unity Catalog managed or external Delta tables. Specify either a file path or a fully qualified table name. |
Apache Kafka sinks | Write to Apache Kafka topics using the Kafka connector included in the pipeline runtime. |
Azure Event Hubs sinks | Write to Azure Event Hubs using the Kafka interface. Uses the same options as Kafka sinks. |
Python custom sinks | Write to any data store using a Python custom data source registered with |
ForEachBatch sinks | Apply custom Python logic to each micro-batch of streaming data. Use when you need to write to multiple destinations, perform upserts, or use targets that don't support streaming writes natively. |
Sink APIs
Pipelines provide two APIs for creating sinks:
create_sink(): Creates a named sink of a supported type (Delta, Kafka, AEH, or Python custom data source). Only available in Python. See Using sinks in pipelines.foreach_batch_sink(): Decorates a Python function that runs for each micro-batch of streaming data. Provides maximum flexibility for custom write logic. See Use ForEachBatch to write to arbitrary data sinks in pipelines.
Both sink types are referenced as the target of an append_flow.
Limitations
- Sinks are only available in Python. SQL is not supported.
- Only streaming queries are supported. Batch queries are not supported.
- Only
append_flowcan write to sinks;create_auto_cdc_flowand other flow types are not supported. - Pipeline expectations are not supported for sinks.
- Running a full refresh does not clean up previously written data in sinks.