Skip to main content

Pipeline Limitations

The following are limitations of Lakeflow Spark Declarative Pipelines that are important to know as you develop your pipelines:

  • A Databricks workspace is limited to 200 concurrent pipeline updates. The number of datasets that a single pipeline can contain is determined by the pipeline configuration and workload complexity.
  • Pipeline datasets can be defined only once. Because of this, they can be the target of only a single operation across all pipelines. The exception is streaming tables with append flow processing, which allows you to write to the streaming table from multiple streaming sources. See Using multiple flows to write to a single target.
  • Identity columns have the following limitations. To learn more about identity columns in Delta tables, see Use identity columns in Delta Lake.
    • Identity columns are not supported with tables that are the target of AUTO CDC processing.
    • Identity columns might be recomputed during updates to a materialized views. Because of this, Databricks recommends using identity columns in pipelines only with streaming tables.
  • Materialized views and streaming tables published from pipelines, including those created by Databricks SQL, can be accessed only by Databricks clients and applications. However, to make your materialized views and streaming tables accessible externally, you can use the sink API to write to tables in an external Delta instance. See Use sinks to stream records to external services with Lakeflow Spark Declarative Pipelines.
  • There are limitations for the Databricks compute required to run and query Unity Catalog pipelines. See the Requirements for pipelines that publish to Unity Catalog.
  • Delta Lake time travel queries are supported only with Streaming tables, and are not supported with materialized views. See Work with Delta Lake table history.
  • You cannot enable Iceberg reads on materialized views and streaming tables.
  • The pivot() function is not supported. The pivot operation in Spark requires the eager loading of input data to compute the output schema. This capability is not supported in pipelines.