Develop Lakeflow Spark Declarative Pipelines with Databricks Asset Bundles
Databricks Asset Bundles, also known simply as bundles, enable you to programmatically validate, deploy, and run Databricks resources such as Lakeflow Spark Declarative Pipelines. See What are Databricks Asset Bundles?.
This page describes how to create a bundle to programmatically manage a pipeline. See Lakeflow Spark Declarative Pipelines. The bundle is created using the Databricks Asset Bundles default bundle template for Python, which defines an ETL pipeline and job to run it. You then validate, deploy, and run the deployed pipeline in your Databricks workspace.
If you have existing pipelines that were created using the Databricks user interface or API that you want to move to bundles, you must define them in a bundle's configuration files. Databricks recommends that you first create a bundle using the steps below, then add configuration and other sources to the bundle. See Retrieve an existing pipeline definition using the UI.
Requirements
- Databricks CLI version 0.276.0 or above. To check your installed version of the Databricks CLI, run the command
databricks -v. To install the Databricks CLI, see Install or update the Databricks CLI. - uv is required to run tests and to install dependencies for this project from an IDE.
- The remote workspace must have workspace files enabled. See What are workspace files?.
- An existing catalog for tables in the pipeline. See Create catalogs.
(Optional) Install a Python module to support local pipeline development
Databricks provides a Python module to assist your local development of Lakeflow Spark Declarative Pipelines code by providing syntax checking, autocomplete, and data type checking as you write code in your IDE.
The Python module for local development is available on PyPi. To install the module, see Python stub for DLT.
Step 1: Set up authentication
First, set up authentication between the Databricks CLI on your development machine and your Databricks workspace. This page assumes that you want to use OAuth user-to-machine (U2M) authentication and a corresponding Databricks configuration profile named DEFAULT for authentication.
U2M authentication is appropriate for trying out these steps in real time. For fully automated workflows, Databricks recommends that you use OAuth machine-to-machine (M2M) authentication instead. See the M2M authentication setup instructions in Authorize service principal access to Databricks with OAuth.
-
Use the Databricks CLI to initiate OAuth token management locally by running the following command for each target workspace.
In the following command, replace
<workspace-url>with your Databricks workspace instance URL, for examplehttps://dbc-a1b2345c-d6e7.cloud.databricks.com.Bashdatabricks auth login --host <workspace-url> -
The Databricks CLI prompts you to save the information that you entered as a Databricks configuration profile. Press
Enterto accept the suggested profile name, or enter the name of a new or existing profile. Any existing profile with the same name is overwritten with the information that you entered. You can use profiles to quickly switch your authentication context across multiple workspaces.To get a list of any existing profiles, in a separate terminal or command prompt, use the Databricks CLI to run the command
databricks auth profiles. To view a specific profile's existing settings, run the commanddatabricks auth env --profile <profile-name>. -
In your web browser, complete the on-screen instructions to log in to your Databricks workspace.
-
To view a profile's current OAuth token value and the token's upcoming expiration timestamp, run one of the following commands:
databricks auth token --host <workspace-url>databricks auth token -p <profile-name>databricks auth token --host <workspace-url> -p <profile-name>
If you have multiple profiles with the same
--hostvalue, you might need to specify the--hostand-poptions together to help the Databricks CLI find the correct matching OAuth token information.
Step 2: Create the bundle
Initialize a bundle using the default Python bundle project template.
-
Use your terminal or command prompt to switch to a directory on your local development machine that will contain the template's generated bundle.
-
Use the Databricks CLI to run the
bundle initcommand:Bashdatabricks bundle init -
For
Template to use, leave the default value ofdefault-pythonby pressingEnter. -
For
Unique name for this project, leave the default value ofmy_project, or type a different value, and then pressEnter. This determines the name of the root directory for this bundle. This root directory is created within your current working directory. -
For
Include a job that runs a notebook, selectnoand pressEnter. (The sample notebook that is associated with this option has no pipeline code in it.) -
For
Include an ETL pipeline, leave the default value ofyesby pressingEnter. This adds sample pipeline code and a pipeline definition. -
For
Include a stub (sample) Python package, selectnoand pressEnter. -
For
Use serverless, selectyesand pressEnter. This instructs the Databricks CLI to configure your bundle to run on serverless compute. -
For
Default catalog for any tables created by this project [hive_metastore], enter the name of an existing Unity Catalog catalog. -
For
Use a personal schema for each user working on this project., selectyes.
Step 3: Explore the bundle
To view the files that the template generated, switch to the root directory of your newly created bundle. Files of particular interest include the following:
databricks.yml: This file specifies the bundle's programmatic name, includes references to the bundle's files, defines catalog and schema variables, and specifies settings for target workspaces.resources/sample_job.ymlandresources/<project-name>_etl_pipeline.yml: These files define the job that contains a pipeline refresh task, and the pipeline's settings. For information about pipeline settings, see pipeline.src/: This folder contains the sample pipeline's source files, explorations, and transformations.tests/andfixtures/: These folders contain sample unit tests for the pipeline and fixtures for data sets.README.md: This file contains additional information about getting started and using this bundle template.
Step 4: Validate the bundle configuration
Now check whether the bundle configuration is valid.
-
From the root directory, use the Databricks CLI to run the
bundle validatecommand:Bashdatabricks bundle validate -
If a summary of the bundle configuration is returned, then the validation succeeded. If any errors are returned, fix the errors, and then repeat this step.
Step 5: Deploy the bundle to the remote workspace
Next, deploy the bundle to your remote Databricks workspace and verify the pipeline in your workspace.
-
From the bundle root, use the Databricks CLI to run the
bundle deploycommand:Bashdatabricks bundle deploy --target devnoteThe default template includes a job that runs the pipeline every day, but this is paused for the target
devdeployment mode. See Databricks Asset Bundle deployment modes. -
Confirm that the bundle was deployed:
- In your Databricks workspace's sidebar, click Workspace.
- Click into the Users >
<your-username>>.bundlefolder and find your bundle project.
-
Check whether your pipeline was created:
- In your Databricks workspace's sidebar, click Jobs & Pipelines.
- Optionally, select the Pipelines and Owned by me filters.
- Click [dev
<your-username>]<project-name>_etl.
If you make any changes to your bundle after this step, you should repeat steps 4-5 to check whether your bundle configuration is still valid and then redeploy the project.
Step 6: Run the deployed pipeline
Now trigger a run of the pipeline in your workspace from the command line.
-
From the root directory, use the Databricks CLI to run the
bundle runcommand, replacing<project-name>with the name of your project:Bashdatabricks bundle run --target dev <project-name>_etl -
Copy the value of
Update URLthat appears in your terminal and paste this value into your web browser to open your Databricks workspace. -
In your Databricks workspace, after the pipeline run completes successfully, click the materialized views to see the details of each view.
If you make any changes to your bundle after this step, you should repeat steps 4-6 to check whether your bundle configuration is still valid, redeploy the project, and run the redeployed project.
Step 7: Run tests
Finally, use pytest to run tests locally:
uv run pytest
Step 8: Clean up
In this step, you delete the deployed bundle and the pipeline from your workspace.
-
From the root directory, use the Databricks CLI to run the
bundle destroycommand:Bashdatabricks bundle destroy --target dev -
When prompted to permanently destroy resources, the pipeline, and tables and views managed by the pipeline, type
yand pressEnter. -
If you also want to delete the bundle from your development machine, you can now delete the local project directory.