What are pipelines?
A pipeline is the main unit of development and execution in Lakeflow Spark Declarative Pipelines (SDP). A pipeline is a collection of source code files and a configuration. The source files declare datasets—streaming tables, materialized views, and views—along with the queries and flows that produce them. The configuration specifies how the pipeline runs and where data is stored.
A pipeline is the container for the flows, streaming tables, materialized views, and sinks that you define. While the pipeline runs, it analyzes the dependencies between these objects and orchestrates their order of execution and parallelization automatically. For details on the objects that a pipeline contains, see What is Lakeflow Spark Declarative Pipelines.
Pipeline source code
Pipeline source code is written in Python or SQL. A single pipeline can mix Python and SQL source files, but each file can contain only one language. Because the pipeline analyzes dataset dependencies across all of its source files, you can organize source code across files in any order.
For language-specific development guidance, see Develop pipeline code with Python and Develop Lakeflow Spark Declarative Pipelines code with SQL.
Pipeline graph
Pipelines automatically infer dependencies between datasets and arrange them in a directed acyclic graph (DAG). The graph determines evaluation order: upstream datasets are computed before downstream ones. You can view and interact with the pipeline graph in the Lakeflow Pipelines Editor.
Pipeline updates
A pipeline update computes the current state of each dataset by:
- Starting a cluster with the correct configuration.
- Analyzing source files and building the dependency graph.
- Computing or incrementally updating each dataset in dependency order.
Pipelines run in two modes:
- Triggered: The pipeline runs once and stops when all datasets are up to date.
- Continuous: The pipeline runs indefinitely and processes new data as it arrives.
Updates you trigger interactively from the editor optimize for fast iteration—reusing the cluster and disabling automatic retries. See Update run behavior.
Pipeline types
The Jobs & Pipelines list includes more than just Lakeflow Spark Declarative Pipelines pipelines. Databricks runs multiple different types of pipelines, and the Jobs & Pipelines list and the pipeline monitoring page label each one with a type so that you can tell which is which. The following table maps each pipeline type to the pipeline_type value recorded in the event log:
Type in Jobs & Pipelines |
| Description |
|---|---|---|
ETL |
| A pipeline defined in Lakeflow Spark Declarative Pipelines. See Lakeflow Spark Declarative Pipelines. |
Ingestion |
| A managed ingestion pipeline created with Lakeflow Connect. See Managed connectors in Lakeflow Connect. |
MV/ST |
| A standalone pipeline. See Standalone pipelines. |
Standalone pipelines
You can create and manage streaming tables and materialized views outside of Lakeflow Spark Declarative Pipelines as standalone pipelines. You can use Databricks SQL or Python to create and refresh standalone streaming tables and materialized views. They run on the same Databricks infrastructure and have the same processing semantics as they do in Lakeflow Spark Declarative Pipelines. When you define a standalone streaming table or materialized view, flows are defined implicitly as part of the streaming table or materialized view definition.
For details, see Standalone pipelines.
Lakeflow Pipelines Editor
The Lakeflow Pipelines Editor is an IDE built for pipeline development. It provides:
- A multi-file code editor for Python and SQL source files
- A pipeline assets browser for organizing files and folders
- An interactive pipeline graph showing dataset dependencies and state
- Data previews for streaming tables and materialized views
- Execution insights and an issues pane showing results from the latest run
- Selective execution to refresh individual files or tables without running the full pipeline
The editor integrates with the Databricks platform and supports version control via Git folders. For step-by-step guidance, see Develop and debug ETL pipelines with the Lakeflow Pipelines Editor.