Delta Live Tables settings

Preview

This feature is in Public Preview. Contact your Databricks representative to request access.

Delta Live Tables settings specify one or more notebooks that implement a pipeline and the parameters specifying how to run the pipeline in an environment, for example, development, staging, or production. Delta Live Tables settings are expressed as JSON and can be modified in the Delta Live Tables UI.

Settings

Fields

id

Type: string

A globally unique identifier for this pipeline. The identifier is assigned by the system and cannot be changed.

name

Type: string

A user-friendly name for this pipeline. The name can be used to identify pipeline jobs in the UI.

storage

Type: string

A location on DBFS or cloud storage where output data and metadata required for pipeline execution are stored. Tables and metadata are stored in subdirectories of this location.

When the storage setting is not specified, the system will default to a location in dbfs:/pipelines/.

The storage setting cannot be changed after a pipeline is created.

configuration

Type: object

An optional list of settings to add to the Spark configuration of the cluster that will run the pipeline. These settings are read by the Delta Live Tables runtime and available to pipeline queries through the Spark configuration.

Elements must be formatted as key:value pairs.

See Parameterize a pipeline for an example of using the configuration object.

libraries

Type: array of objects

An array of notebooks containing the pipeline code and required artifacts. See Configure multiple notebooks in a pipeline for an example.

clusters

Type: array of objects

An array of specifications for the clusters to run the pipeline. See Cluster configuration for more detail.

If this is not specified, pipelines will automatically select a default cluster configuration for the pipeline.

continuous

Type: boolean

A flag indicating whether to run the pipeline continuously.

The default value is false.

target

Type: string

The name of a database for persisting pipeline output data. Configuring the target setting allows you to view and query the pipeline output data from the Databricks UI.

Cluster configuration

You can configure clusters used by managed pipelines with the same JSON format as the create cluster API. You can specify configuration for two different cluster types: a default cluster where all processing is performed and a maintenance cluster where daily maintenance tasks are run. Each cluster is identified using the label field.

Specifying cluster properties is optional, and the system uses defaults for any missing values.

Note

You cannot set the Spark version in cluster configurations. Delta Live Tables clusters run on a custom version of Databricks Runtime that is continually updated to include the latest features.

Note

If you need an instance profile or other configuration to access your storage location, specify it for both the default cluster and the maintenance cluster.

An example configuration for a default cluster and a maintenance cluster:

{
  "clusters": [
    {
      "label": "default",
      "node_type_id": "c5.4xlarge",
      "driver_node_type_id": "c5.4xlarge",
      "num_workers": 20,
      "spark_conf": {
        "spark.databricks.io.parquet.nativeReader.enabled": "false"
      },
      "aws_attributes": {
        "instance_profile_arn": "arn:aws:..."
      }
    },
    {
      "label": "maintenance",
      "aws_attributes": {
        "instance_profile_arn": "arn:aws:..."
      }
    }
  ]
}

Examples

Configure a pipeline and cluster

The following example configures a triggered pipeline implemented in example-notebook_1, using DBFS for storage, and running on a small one-node cluster:

{
  "name": "Example pipeline 1",
  "storage": "dbfs:/pipeline-examples/storage-location/example1",
  "clusters": [
    {
      "num_workers": 1,
      "spark_conf": {}
    }
  ],
  "libraries": [
    {
      "notebook": {
         "path": "/Users/user@databricks.com/example_notebook_1"
      }
    }
  ],
  "continuous": false
}

Parameterize a pipeline

The following example uses the configuration field to set the mycompany.pipeline.inputData value based on the environment that the pipeline is running in.

"name": "Staging pipeline",
"configuration": {
  "mycompany.pipeline.inputData": "dbfs:/staging/dataset"
}
"name": "Production pipeline",
"configuration": {
  "mycompany.pipeline.inputData": "dbfs:/prod/dataset"
}
inputData = spark.conf.get("mycompany.pipeline.inputData")

Configure multiple notebooks in a pipeline

The following creates a pipeline that includes the datasets defined in example-notebook_1 and example-notebook_2:

{
  "name": "Example pipeline 3",
  "storage": "dbfs:/pipeline-examples/storage-location/example3",
  "libraries": [
      { "notebook": { "path": "/example-notebook_1" } },
      { "notebook": { "path": "/example-notebook_2" } },
    ]
}