Spark Declarative Pipelines

Apache Spark™ Declarative Pipelines is a declarative framework for building batch and streaming data pipelines in SQL and Python. Lakeflow pipelines extend and are interoperable with Spark Declarative Pipelines, while running on the performance-optimized Databricks Runtime. Common use cases for pipelines include data ingestion from sources such as cloud storage (such as Amazon S3, Azure ADLS Gen2, and Google Cloud Storage) and message buses (such as Apache Kafka, Amazon Kinesis, Google Pub/Sub, Azure EventHub, and Apache Pulsar), and incremental batch and streaming transformations.

This section provides detailed information about using pipelines. The following topics help you get started.

Topic	Description
Pipeline concepts	Learn about the high-level concepts of pipelines, including flows, streaming tables, and materialized views.
Tutorials	Follow tutorials to give you hands-on experience with using pipelines.
Develop pipelines	Learn how to develop and test pipelines that create flows for ingesting and transforming data.
Configure pipelines	Learn how to schedule and configure pipelines.
Monitor pipelines	Learn how to monitor your pipelines and troubleshoot pipeline queries.
Developers	Learn how to use Python and SQL when developing pipelines.
Standalone pipelines	Learn about creating standalone streaming tables and materialized views in Databricks SQL or Python.
Best practices	Learn recommended patterns for building reliable, efficient, and maintainable pipelines.

Topic	Description
Pipeline concepts	Learn about the high-level concepts of pipelines, including flows, streaming tables, and materialized views.
Tutorials	Follow tutorials to give you hands-on experience with using pipelines.
Develop pipelines	Learn how to develop and test pipelines that create flows for ingesting and transforming data.
Configure pipelines	Learn how to schedule and configure pipelines.
Monitor pipelines	Learn how to monitor your pipelines and troubleshoot pipeline queries.
Developers	Learn how to use Python and SQL when developing pipelines.
Standalone pipelines	Learn about creating standalone streaming tables and materialized views in Databricks SQL or Python.
Best practices	Learn recommended patterns for building reliable, efficient, and maintainable pipelines.

Additional resources​

Additional resources