Lakeflow Declarative Pipelines properties reference

This article provides a reference for Lakeflow Declarative Pipelines JSON setting specification and table properties in Databricks. For more details on using these various properties and configurations, see the following articles:

Lakeflow Declarative Pipelines configurations

Fields
`id` Type: `string` A globally unique identifier for this pipeline. The identifier is assigned by the system and cannot be changed.
`name` Type: `string` A user-friendly name for this pipeline. The name can be used to identify pipeline jobs in the UI.
`configuration` Type: `object` An optional list of settings to add to the Spark configuration of the cluster that will run the pipeline. These settings are read by the Lakeflow Declarative Pipelines runtime and available to pipeline queries through the Spark configuration. Elements must be formatted as `key:value` pairs.
`libraries` Type: `array of objects` An array of notebooks containing the pipeline code and required artifacts.
`clusters` Type: `array of objects` An array of specifications for the clusters to run the pipeline. If this is not specified, pipelines will automatically select a default cluster configuration for the pipeline.
`development` Type: `boolean` A flag indicating whether to run the pipeline in `development` or `production` mode. The default value is `true`
`notifications` Type: `array of objects` An optional array of specifications for email notifications when a pipeline update completes, fails with a retryable error, fails with a non-retryable error, or a flow fails.
`continuous` Type: `boolean` A flag indicating whether to run the pipeline continuously. The default value is `false`.
`catalog` Type: `string` The name of the default catalog for the pipeline, where all datasets and metadata for pipeline are published. Setting this value enables Unity Catalog for the pipeline. If left unset, the pipeline publishes to the legacy Hive metastore using the location specified in `storage`. In legacy publishing mode, specifies the catalog containing the target schema where all datasets from the current pipeline are published. See LIVE schema (legacy).
`schema` Type: `string` The name of the default schema for the pipeline, where all datasets and metadata for the pipeline are published by default. See Set the target catalog and schema.
`target` (legacy) Type: `string` The name of the target schema where all datasets defined in the current pipeline are published. Setting `target` instead of `schema` configures the pipeline to use legacy publishing mode. See LIVE schema (legacy).
`storage` (legacy) Type: `string` A location on DBFS or cloud storage where output data and metadata required for pipeline execution are stored. Tables and metadata are stored in subdirectories of this location. When the `storage` setting is not specified, the system will default to a location in `dbfs:/pipelines/`. The `storage` setting cannot be changed after a pipeline is created.
`channel` Type: `string` The version of the Lakeflow Declarative Pipelines runtime to use. The supported values are: `preview` to test your pipeline with upcoming changes to the runtime version. `current` to use the current runtime version. The `channel` field is optional. The default value is `current`. Databricks recommends using the current runtime version for production workloads.
`edition` Type `string` The Lakeflow Declarative Pipelines product edition to run the pipeline. This setting allows you to choose the best product edition based on the requirements of your pipeline: `CORE` to run streaming ingest workloads. `PRO` to run streaming ingest and change data capture (CDC) workloads. `ADVANCED` to run streaming ingest workloads, CDC workloads, and workloads that require Lakeflow Declarative Pipelines expectations to enforce data quality constraints. The `edition` field is optional. The default value is `ADVANCED`.
`photon` Type: `boolean` A flag indicating whether to use What is Photon? to run the pipeline. Photon is the Databricks high performance Spark engine. Photon-enabled pipelines are billed at a different rate than non-Photon pipelines. The `photon` field is optional. The default value is `false`.
`pipelines.maxFlowRetryAttempts` Type: `int` If a retryable failure occurs during a pipeline update, this is the maximum number of times to retry a flow before failing the pipeline update Default: Two retry attempts. When a retryable failure occurs, the Lakeflow Declarative Pipelines runtime attempts to run the flow three times, including the original attempt.
`pipelines.numUpdateRetryAttempts` Type: `int` If a retryable failure occurs during an update, this is the maximum number of times to retry the update before permanently failing the update. The retry is run as a full update. This parameter applies only to pipelines running in production mode. Retries are not attempted if your pipeline runs in development mode or when you run a `Validate` update. Default: Five for triggered pipelines. Unlimited for continuous pipelines.

Fields

id

Type: string

A globally unique identifier for this pipeline. The identifier is assigned by the system and cannot be changed.

name

Type: string

A user-friendly name for this pipeline. The name can be used to identify pipeline jobs in the UI.

configuration

Type: object

An optional list of settings to add to the Spark configuration of the cluster that will run the pipeline. These settings are read by the Lakeflow Declarative Pipelines runtime and available to pipeline queries through the Spark configuration.

Elements must be formatted as key:value pairs.

libraries

Type: array of objects

An array of notebooks containing the pipeline code and required artifacts.

clusters

Type: array of objects

An array of specifications for the clusters to run the pipeline.

If this is not specified, pipelines will automatically select a default cluster configuration for the pipeline.

development

Type: boolean

A flag indicating whether to run the pipeline in development or production mode.

The default value is true

notifications

Type: array of objects

An optional array of specifications for email notifications when a pipeline update completes, fails with a retryable error, fails with a non-retryable error, or a flow fails.

continuous

Type: boolean

A flag indicating whether to run the pipeline continuously.

The default value is false.

catalog

Type: string

The name of the default catalog for the pipeline, where all datasets and metadata for pipeline are published. Setting this value enables Unity Catalog for the pipeline.

If left unset, the pipeline publishes to the legacy Hive metastore using the location specified in storage.

In legacy publishing mode, specifies the catalog containing the target schema where all datasets from the current pipeline are published. See LIVE schema (legacy).

schema

Type: string

The name of the default schema for the pipeline, where all datasets and metadata for the pipeline are published by default. See Set the target catalog and schema.

target (legacy)

Type: string

The name of the target schema where all datasets defined in the current pipeline are published.

Setting target instead of schema configures the pipeline to use legacy publishing mode. See LIVE schema (legacy).

storage (legacy)

Type: string

A location on DBFS or cloud storage where output data and metadata required for pipeline execution are stored. Tables and metadata are stored in subdirectories of this location.

When the storage setting is not specified, the system will default to a location in dbfs:/pipelines/.

The storage setting cannot be changed after a pipeline is created.

channel

Type: string

The version of the Lakeflow Declarative Pipelines runtime to use. The supported values are:

preview to test your pipeline with upcoming changes to the runtime version.
current to use the current runtime version.

The channel field is optional. The default value is current. Databricks recommends using the current runtime version for production workloads.

edition

Type string

The Lakeflow Declarative Pipelines product edition to run the pipeline. This setting allows you to choose the best product edition based on the requirements of your pipeline:

CORE to run streaming ingest workloads.
PRO to run streaming ingest and change data capture (CDC) workloads.
ADVANCED to run streaming ingest workloads, CDC workloads, and workloads that require Lakeflow Declarative Pipelines expectations to enforce data quality constraints.

The edition field is optional. The default value is ADVANCED.

photon

Type: boolean

A flag indicating whether to use What is Photon? to run the pipeline. Photon is the Databricks high performance Spark engine. Photon-enabled pipelines are billed at a different rate than non-Photon pipelines.

The photon field is optional. The default value is false.

pipelines.maxFlowRetryAttempts

Type: int

If a retryable failure occurs during a pipeline update, this is the maximum number of times to retry a flow before failing the pipeline update

Default: Two retry attempts. When a retryable failure occurs, the Lakeflow Declarative Pipelines runtime attempts to run the flow three times, including the original attempt.

pipelines.numUpdateRetryAttempts

Type: int

If a retryable failure occurs during an update, this is the maximum number of times to retry the update before permanently failing the update. The retry is run as a full update.

This parameter applies only to pipelines running in production mode. Retries are not attempted if your pipeline runs in development mode or when you run a Validate update.

Default:

Five for triggered pipelines.
Unlimited for continuous pipelines.

Lakeflow Declarative Pipelines table properties

In addition to the table properties supported by Delta Lake, you can set the following table properties.

Table properties
`pipelines.autoOptimize.zOrderCols` Default: None An optional string containing a comma-separated list of column names to z-order this table by. For example, `pipelines.autoOptimize.zOrderCols = "year,month"`
`pipelines.reset.allowed` Default: `true` Controls whether a full refresh is allowed for this table.
`pipelines.autoOptimize.managed` Default: `true` Enables or disables automatically scheduled optimization of this table. For pipelines managed by predictive optimization, this property is not used.

Table properties

pipelines.autoOptimize.zOrderCols

Default: None

An optional string containing a comma-separated list of column names to z-order this table by. For example, pipelines.autoOptimize.zOrderCols = "year,month"

pipelines.reset.allowed

Default: true

Controls whether a full refresh is allowed for this table.

pipelines.autoOptimize.managed

Default: true

Enables or disables automatically scheduled optimization of this table.

For pipelines managed by predictive optimization, this property is not used.

Pipelines trigger interval

You can specify a pipeline trigger interval for the entire pipeline or as part of a dataset declaration. See Set trigger interval for continuous pipelines.

`pipelines.trigger.interval`
The default is based on flow type: Five seconds for streaming queries. One minute for complete queries when all input data is from Delta sources. Ten minutes for complete queries when some data sources may be non-Delta. The value is a number plus the time unit. The following are the valid time units: `second`, `seconds` `minute`, `minutes` `hour`, `hours` `day`, `days` You can use the singular or plural unit when defining the value, for example: `{"pipelines.trigger.interval" : "1 hour"}` `{"pipelines.trigger.interval" : "10 seconds"}` `{"pipelines.trigger.interval" : "30 second"}` `{"pipelines.trigger.interval" : "1 minute"}` `{"pipelines.trigger.interval" : "10 minutes"}` `{"pipelines.trigger.interval" : "10 minute"}`

pipelines.trigger.interval

The default is based on flow type:

Five seconds for streaming queries.
One minute for complete queries when all input data is from Delta sources.
Ten minutes for complete queries when some data sources may be non-Delta.

The value is a number plus the time unit. The following are the valid time units:

second, seconds
minute, minutes
hour, hours
day, days

You can use the singular or plural unit when defining the value, for example:

{"pipelines.trigger.interval" : "1 hour"}
{"pipelines.trigger.interval" : "10 seconds"}
{"pipelines.trigger.interval" : "30 second"}
{"pipelines.trigger.interval" : "1 minute"}
{"pipelines.trigger.interval" : "10 minutes"}
{"pipelines.trigger.interval" : "10 minute"}

Cluster attributes that are not user settable

Because Lakeflow Declarative Pipelines manages cluster lifecycles, many cluster settings are set by Lakeflow Declarative Pipelines and cannot be manually configured by users, either in a pipeline configuration or in a cluster policy used by a pipeline. The following table lists these settings and why they cannot be manually set.

Fields
`cluster_name` Lakeflow Declarative Pipelines sets the names of the clusters used to run pipeline updates. These names cannot be overridden.
`data_security_mode` `access_mode` These values are automatically set by the system.
`spark_version` Lakeflow Declarative Pipelines clusters run on a custom version of Databricks Runtime that is continually updated to include the latest features. The version of Spark is bundled with the Databricks Runtime version and cannot be overridden.
`autotermination_minutes` Because Lakeflow Declarative Pipelines manages cluster auto-termination and reuse logic, the cluster auto-termination time cannot be overridden.
`runtime_engine` Although you can control this field by enabling Photon for your pipeline, you cannot set this value directly.
`effective_spark_version` This value is automatically set by the system.
`cluster_source` This field is set by the system and is read-only.
`docker_image` Because Lakeflow Declarative Pipelines manages the cluster lifecycle, you cannot use a custom container with pipeline clusters.
`workload_type` This value is set by the system and cannot be overridden.

Fields

cluster_name

Lakeflow Declarative Pipelines sets the names of the clusters used to run pipeline updates. These names cannot be overridden.

data_security_mode access_mode

These values are automatically set by the system.

spark_version

Lakeflow Declarative Pipelines clusters run on a custom version of Databricks Runtime that is continually updated to include the latest features. The version of Spark is bundled with the Databricks Runtime version and cannot be overridden.

autotermination_minutes

Because Lakeflow Declarative Pipelines manages cluster auto-termination and reuse logic, the cluster auto-termination time cannot be overridden.

runtime_engine

Although you can control this field by enabling Photon for your pipeline, you cannot set this value directly.

effective_spark_version

This value is automatically set by the system.

cluster_source

This field is set by the system and is read-only.

docker_image

Because Lakeflow Declarative Pipelines manages the cluster lifecycle, you cannot use a custom container with pipeline clusters.

workload_type

This value is set by the system and cannot be overridden.

Lakeflow Declarative Pipelines configurations​

Lakeflow Declarative Pipelines table properties​

Pipelines trigger interval​

Cluster attributes that are not user settable​

Lakeflow Declarative Pipelines configurations

Lakeflow Declarative Pipelines table properties

Pipelines trigger interval

Cluster attributes that are not user settable