Configure pipeline settings for Delta Live Tables

This article provides details on configuring pipeline settings for Delta Live Tables. Delta Live Tables provides a user interface for configuring and editing pipeline settings. The UI also provides an option to display and edit settings in JSON.


You can configure most settings with either the UI or a JSON specification. Some advanced options are only available using the JSON configuration.

Databricks recommends familiarizing yourself with Delta Live Tables settings using the UI. If necessary, you can directly edit the JSON configuration in the workspace. JSON configuration files are also useful when deploying pipelines to new environments or when using the CLI or REST API.

For a full reference to the Delta Live Tables JSON configuration settings, see Delta Live Tables pipeline configurations.

Choose a product edition

Select the Delta Live Tables product edition with the features best suited for your pipeline requirements. The following product editions are available:

  • Core to run streaming ingest workloads. Select the Core edition if your pipeline doesn’t require advanced features such as change data capture (CDC) or Delta Live Tables expectations.

  • Pro to run streaming ingest and CDC workloads. The Pro product edition supports all of the Core features, plus support for workloads that require updating tables based on changes in source data.

  • Advanced to run streaming ingest workloads, CDC workloads, and workloads that require expectations. The Advanced product edition supports the features of the Core and Pro editions, and also supports enforcement of data quality constraints with Delta Live Tables expectations.

You can select the product edition when you create or edit a pipeline. You can select a different edition for each pipeline. See the Delta Live Tables product page.


If your pipeline includes features not supported by the selected product edition, for example, expectations, you will receive an error message with the reason for the error. You can then edit the pipeline to select the appropriate edition.

Choose a pipeline mode

You can choose to update your pipeline continuously or with manual triggers based on the pipeline mode. See Continuous vs. triggered pipeline execution.

Select a cluster policy

Users must have permissions to deploy compute to configure and update Delta Live Tables pipelines. Workspace admins can configure cluster policies to provide users with access to compute resources for Delta Live Tables. See Define limits on Delta Live Tables pipeline clusters.


Cluster policies are optional. Check with your workspace administrator if you lack compute privileges required for Delta Live Tables.

Configure source code libraries

You can use the file selector in the Delta Live Tables UI to configure the source code defining your pipeline. Pipeline source code is defined in Databricks notebooks or in SQL or Python scripts stored in workspace files. When you create or edit your pipeline, you can add one or more notebooks or workspace files or a combination of notebooks and workspace files.

Because Delta Live Tables automatically analyzes dataset dependencies to construct the processing graph for your pipeline, you can add source code libraries in any order.

You can also modify the JSON file to include Delta Live Tables source code defined in SQL and Python scripts stored in workspace files. The following example includes notebooks and workspace files from Databricks Repos:

  "name": "Example pipeline 3",
  "storage": "dbfs:/pipeline-examples/storage-location/example3",
  "libraries": [
    { "notebook": { "path": "/example-notebook_1" } },
    { "notebook": { "path": "/example-notebook_2" } },
    { "file": { "path": "/Repos/<user_name>" } },
    { "file": { "path": "/Repos/<user_name>" } }

Specify a storage location

You can choose to specify a storage location for a pipeline that publishes to the Hive metastore. The primary motivation for specifying a location is to control the object storage location for data written by your pipeline.

Because all tables, data, checkpoints, and metadata for Delta Live Tables pipelines are fully managed by Delta Live Tables, most interaction with Delta Live Tables datasets happens through tables registered to the Hive metastore or Unity Catalog.

Specify a target schema for pipeline output tables

While optional, you should specify a target to publish tables created by your pipeline anytime you move beyond development and testing for a new pipeline. Publishing a pipeline to a target makes datasets available for querying elsewhere in your Databricks environment. See Publish data from Delta Live Tables pipelines to the Hive metastore or Use Unity Catalog with your Delta Live Tables pipelines.

Configure your compute settings

Each Delta Live Tables pipeline has two associated clusters.

  1. The default cluster is used to process pipeline updates.

  2. The maintenance cluster runs daily maintenance tasks.

Compute settings in the Delta Live Tables UI primarily target the default cluster used for pipeline updates. If you specify a storage location requiring data access credentials, you must ensure that the maintenance cluster also has these permissions configured.

Delta Live Tables provides similar options for cluster settings as other compute on Databricks. Like other pipeline settings, you can modify the JSON configuration for clusters to specify options not present in the UI. See Clusters.


  • You cannot set the Spark version in cluster configurations. Delta Live Tables clusters run on a custom version of Databricks Runtime that is continually updated to include the latest features. Manually setting a version may result in pipeline failures.

  • You can configure Delta Live Tables pipelines to leverage Photon. See Photon runtime.

Use autoscaling to increase efficiency and reduce resource usage

Use Enhanced Autoscaling to optimize the cluster utilization of your pipelines. Enhanced Autoscaling adds additional resources only if the system determines those resources will increase pipeline processing speed. Resources are freed when they are no longer needed, and clusters are shut down as soon as all pipeline updates are complete.

Use the following guidelines when configuring Enhanced Autoscaling for production pipelines:

  • Leave the Min workers setting at the default.

  • Set the Max workers setting to a value based on budget and pipeline priority.

Delay compute shutdown

Because a Delta Live Tables cluster automatically shuts down when not in use, referencing a cluster policy that sets autotermination_minutes in your cluster configuration results in an error. To control cluster shutdown behavior, you can use development or production mode or use the pipelines.clusterShutdown.delay setting in the pipeline configuration. The following example sets the pipelines.clusterShutdown.delay value to 60 seconds:

    "configuration": {
        "pipelines.clusterShutdown.delay": "60s"

When production mode is enabled, the default value for pipelines.clusterShutdown.delay is 0 seconds. When development mode is enabled, the default value is 2 hours.

Create a single node cluster

If you set num_workers to 0 in cluster settings, the cluster is created as a Single Node cluster. Configuring an autoscaling cluster and setting min_workers to 0 and max_workers to 0 also creates a Single Node cluster.

If you configure an autoscaling cluster and set only min_workers to 0, then the cluster is not created as a Single Node cluster. The cluster has at least 1 active worker at all times until terminated.

An example cluster configuration to create a Single Node cluster in Delta Live Tables:

    "clusters": [
            "label": "default",
            "num_workers": 0

Configure cluster tags

You can use cluster tags to monitor usage for your pipeline clusters. Add cluster tags in the Delta Live Tables UI when you create or edit a pipeline, or by editing the JSON settings for your pipeline clusters.

Cloud storage configuration

You use AWS instance profiles to configure access to S3 storage in AWS. To add an instance profile in the Delta Live Tables UI, click Advanced when you create or edit a pipeline and select an instance profile in the Instance profile dropdown menu.

You can also configure an AWS instance profile by editing the JSON settings for your pipeline clusters when you create or edit a pipeline with the Delta Live Tables API or in the Delta Live Tables UI:

  1. On the Pipeline details page for your pipeline, click the Settings button. The Pipeline settings page appears.

  2. Click the JSON button.

  3. Enter the instance profile configuration in the aws_attributes.instance_profile_arn field in the cluster configuration:

  "clusters": [
      "label": "default",
      "aws_attributes": {
        "instance_profile_arn": "arn:aws:..."
      "label": "maintenance",
      "aws_attributes": {
        "instance_profile_arn": "arn:aws:..."

When configuring an instance profile in the JSON settings, you must specify the instance profile configuration for the default and maintenance clusters.

You can also configure instance profiles when you create cluster policies for your Delta Live Tables pipelines. For an example, see the knowledge base.

Parameterize pipelines

The Python and SQL code that defines your datasets can be parameterized by the pipeline’s settings. Parameterization enables the following use cases:

  • Separating long paths and other variables from your code.

  • Reducing the amount of data processed in development or staging environments to speed up testing.

  • Reusing the same transformation logic to process from multiple data sources.

The following example uses the startDate configuration value to limit the development pipeline to a subset of the input data:

AS SELECT * FROM sourceTable WHERE date > '${mypipeline.startDate}';
def customer_events():
  start_date = spark.conf.get("mypipeline.startDate")
  return read("sourceTable").where(col("date") > start_date)
  "name": "Data Ingest - DEV",
  "configuration": {
    "mypipeline.startDate": "2021-01-02"
  "name": "Data Ingest - PROD",
  "configuration": {
    "mypipeline.startDate": "2010-01-02"

Pipelines trigger interval

You can use pipelines.trigger.interval to control the trigger interval for a flow updating a table or an entire pipeline. Because a triggered pipeline processes each table only once, the pipelines.trigger.interval is used only with continuous pipelines.

Databricks recommends setting pipelines.trigger.interval on individual tables because of different defaults for streaming versus batch queries. Set the value on a pipeline only when your processing requires controlling updates for the entire pipeline graph.

You set pipelines.trigger.interval on a table using spark_conf in Python, or SET in SQL:

  spark_conf={"pipelines.trigger.interval" : "10 seconds"}
def <function-name>():
    return (<query>)
SET pipelines.trigger.interval='10 seconds';


To set pipelines.trigger.interval on a pipeline, add it to the configuration object in the pipeline settings:

  "configuration": {
    "pipelines.trigger.interval": "10 seconds"

Add email notifications for pipeline events

You can configure one or more email addresses to receive notifications when the following occurs:

  • A pipeline update completes successfully.

  • Each time a pipeline update fails with a retryable error.

  • A pipeline update fails with a non-retryable (fatal) error.

  • A single data flow fails.