Sinks in Lakeflow pipelines

By default, pipeline flows write results to Delta tables managed by Unity Catalog, typically streaming tables or materialized views. Sinks are an alternative output target that let you write transformed data to destinations outside Databricks-managed storage, such as event streaming services or custom data stores.

Sinks are used with append flows. You define a sink using one of the sink APIs, then reference it as the target in your append_flow definition.

When to use sinks

Databricks recommends using sinks when you need to:

Build operational use cases with low latency, such as fraud detection, real-time analytics, or customer recommendations, where data must flow to a message bus rather than cloud storage. For workloads that require millisecond latency, see Use real-time mode in Lakeflow pipelines.
Write transformed data to tables managed by an external Delta instance, including Unity Catalog managed and external tables.
Perform reverse ETL into external systems, such as writing processed data back to Apache Kafka topics for consumption outside Databricks.
Write to a format not natively supported by Databricks, using Python custom data sources.

Sink types

Pipelines support the following sink types:

Sink type	Description
Delta table sinks	Write to Unity Catalog managed or external Delta tables. Specify either a file path or a fully qualified table name.
Apache Kafka sinks	Write to Apache Kafka topics using the Kafka connector included in the pipeline runtime.
Azure Event Hubs sinks	Write to Azure Event Hubs using the Kafka interface. Uses the same options as Kafka sinks.
Python custom sinks	Write to any data store using a Python custom data source registered with `spark.dataSource.register`.
ForEachBatch sinks	Apply custom Python logic to each micro-batch of streaming data. Use when you need to write to multiple destinations, perform upserts, or use targets that don't support streaming writes natively.

Sink type	Description
Delta table sinks	Write to Unity Catalog managed or external Delta tables. Specify either a file path or a fully qualified table name.
Apache Kafka sinks	Write to Apache Kafka topics using the Kafka connector included in the pipeline runtime.
Azure Event Hubs sinks	Write to Azure Event Hubs using the Kafka interface. Uses the same options as Kafka sinks.
Python custom sinks	Write to any data store using a Python custom data source registered with `spark.dataSource.register`.
ForEachBatch sinks	Apply custom Python logic to each micro-batch of streaming data. Use when you need to write to multiple destinations, perform upserts, or use targets that don't support streaming writes natively.

Sink APIs

Pipelines provide two APIs for creating sinks:

create_sink(): Creates a named sink of a supported type (Delta, Kafka, AEH, or Python custom data source). Only available in Python. See Use sinks in pipelines.
foreach_batch_sink(): Decorates a Python function that runs for each micro-batch of streaming data. Provides maximum flexibility for custom write logic. See Use ForEachBatch to write to arbitrary data sinks in pipelines.

Both sink types are referenced as the target of an append_flow.

Limitations

Sinks are only available in Python. SQL is not supported.
Only streaming queries are supported. Batch queries are not supported.
Only append_flow can write to sinks; create_auto_cdc_flow and other flow types are not supported.
Pipeline expectations are not supported for sinks.
Running a full refresh does not clean up previously written data in sinks.

When to use sinks​

Sink types​

Sink APIs​

Limitations​

Additional resources​

When to use sinks

Sink types

Sink APIs

Limitations

Additional resources