Skip to main content

Lakeflow Spark Declarative Pipelines

Lakeflow Spark Declarative Pipelines (SDP) is a framework for creating batch and streaming data pipelines in SQL and Python. Lakeflow SDP extends and is interoperable with Apache Spark Declarative Pipelines, while running on the performance-optimized Databricks Runtime. Common use cases for pipelines include data ingestion from sources such as cloud storage (such as Amazon S3, Azure ADLS Gen2, and Google Cloud Storage) and message buses (such as Apache Kafka, Amazon Kinesis, Google Pub/Sub, Azure EventHub, and Apache Pulsar), and incremental batch and streaming transformations.

This section provides detailed information about using pipelines. The following topics will help you to get started.

Topic

Description

Lakeflow Spark Declarative Pipelines concepts

Learn about the high-level concepts of SDP, including pipelines, flows, streaming tables, and materialized views.

Tutorials

Follow tutorials to give you hands-on experience with using pipelines.

Develop pipelines

Learn how to develop and test pipelines that create flows for ingesting and transforming data.

Configure pipelines

Learn how to schedule and configure pipelines.

Monitor pipelines

Learn how to monitor your pipelines and troubleshoot pipeline queries.

Developers

Learn how to use Python and SQL when developing pipelines.

Pipelines in Databricks SQL

Learn about using streaming tables and materialized views in Databricks SQL.

More information