Skip to main content

Standalone pipelines vs. Lakeflow Spark Declarative Pipelines

Databricks offers two ways to build materialized views and streaming tables: standalone pipelines, or full pipelines created with Lakeflow Spark Declarative Pipelines. Both run on the same declarative engine and produce Unity Catalog managed tables. The difference is how much of the pipeline you author and operate.

  • A standalone materialized view or streaming table is a single dataset defined with SQL syntax. Databricks creates and manages a pipeline behind the scenes to refresh it. You create and refresh standalone datasets from a Databricks SQL warehouse, or from a notebook on serverless general compute using spark.sql(). See Standalone pipelines.
  • A Lakeflow Spark Declarative Pipelines pipeline is a pipeline that you author and operate as a unit. It can contain many datasets, in SQL and Python, with dependency orchestration, lineage, and pipeline-wide operational features. See What are pipelines?.

When you create a standalone materialized view or streaming table, the managed pipeline appears on the Jobs & Pipelines page with a pipeline type of MV/ST. Datasets defined in a Lakeflow Spark Declarative Pipelines pipeline have a pipeline type of ETL.

When to use a standalone pipeline

Use standalone materialized views and streaming tables when:

  • You accelerate queries or transform data with a single materialized view or streaming table.
  • You work from a Databricks SQL warehouse, the SQL editor, or a notebook on serverless general compute, and schedule refreshes with SCHEDULE, TRIGGER ON UPDATE, or a SQL task in a job.
  • You don't need sinks, multi-stage orchestration, or other pipeline-only features.

When to use a Lakeflow Spark Declarative Pipelines pipeline

Use a Lakeflow Spark Declarative Pipelines pipeline when:

  • You build a multi-stage pipeline with intermediate datasets, where Databricks manages dependencies and lineage across the datasets. Intermediate datasets can be published to the catalog or kept private to the pipeline.
  • You author tables and flows in Python.
  • You write to external Delta tables or event streaming destinations using sinks (create_sink() or foreach_batch_sink()).
  • You apply change data capture from a database snapshot using create_auto_cdc_from_snapshot_flow().
  • You want triggered or continuous execution across the whole pipeline.

Comparison

Property

Standalone streaming table or materialized view

Pipeline streaming table or materialized view

Authoring interface

SQL syntax, from a Databricks SQL warehouse or with spark.sql() in a notebook on serverless general compute

SQL and Python

Scope

One dataset, in a pipeline that Databricks manages for you

Many datasets in one pipeline, with dependency orchestration and lineage

Execution

Triggered, with SCHEDULE, TRIGGER ON UPDATE, or a SQL task

Triggered or continuous

Pipeline-only features

Sinks, create_auto_cdc_from_snapshot_flow(), private datasets

Pipeline type label

MV/ST

ETL

Move between pipelines

Not supported; recreate the table in the target pipeline

Supported