Skip to main content

Sinks in Lakeflow Spark Declarative Pipelines

By default, pipeline flows write results to Delta tables managed by Unity Catalog—typically streaming tables or materialized views. Sinks are an alternative output target that let you write transformed data to destinations outside Databricks-managed storage, such as event streaming services or custom data stores.

Sinks are used with append flows. You define a sink using one of the sink APIs, then reference it as the target in your append_flow definition.

When to use sinks

Databricks recommends using sinks when you need to:

  • Build operational use cases with low latency, such as fraud detection, real-time analytics, or customer recommendations, where data must flow to a message bus rather than cloud storage. For workloads that require millisecond latency, see Use real-time mode in Lakeflow Spark Declarative Pipelines.
  • Write transformed data to tables managed by an external Delta instance, including Unity Catalog managed and external tables.
  • Perform reverse ETL into external systems, such as writing processed data back to Apache Kafka topics for consumption outside Databricks.
  • Write to a format not natively supported by Databricks, using Python custom data sources.

Sink types

Pipelines support the following sink types:

Sink type

Description

Delta table sinks

Write to Unity Catalog managed or external Delta tables. Specify either a file path or a fully qualified table name.

Apache Kafka sinks

Write to Apache Kafka topics using the Kafka connector included in the pipeline runtime.

Azure Event Hubs sinks

Write to Azure Event Hubs using the Kafka interface. Uses the same options as Kafka sinks.

Python custom sinks

Write to any data store using a Python custom data source registered with spark.dataSource.register.

ForEachBatch sinks

Apply custom Python logic to each micro-batch of streaming data. Use when you need to write to multiple destinations, perform upserts, or use targets that don't support streaming writes natively.

Sink APIs

Pipelines provide two APIs for creating sinks:

Both sink types are referenced as the target of an append_flow.

Limitations

  • Sinks are only available in Python. SQL is not supported.
  • Only streaming queries are supported. Batch queries are not supported.
  • Only append_flow can write to sinks; create_auto_cdc_flow and other flow types are not supported.
  • Pipeline expectations are not supported for sinks.
  • Running a full refresh does not clean up previously written data in sinks.

Additional resources