Skip to main content

Develop a job with Databricks Asset Bundles

Databricks Asset Bundles, also known simply as bundles, contain the artifacts you want to deploy and the settings for Databricks resources such as jobs that you want to run, and enable you to programmatically validate, deploy, and run them. See What are Databricks Asset Bundles?.

This page describes how to create a bundle to programmatically manage a job. See Lakeflow Jobs. The bundle is created using the Databricks Asset Bundles default bundle template for Python, which consists of a notebook and the definition of a job to run it. You then validate, deploy, and run the deployed job in your Databricks workspace.

tip

If you have existing jobs that were created using the Lakeflow Jobs user interface or API that you want to move to bundles, you must define them in a bundle's configuration files. Databricks recommends that you first create a bundle using the steps below and then validate whether the bundle works. You can then add additional job definitions, notebooks, and other sources to the bundle. See Retrieve an existing job definition using the UI.

If you want to create a bundle from scratch, see Create a bundle manually.

Requirements

  • Databricks CLI version 0.218.0 or above. To check your installed version of the Databricks CLI, run the command databricks -v. To install the Databricks CLI, see Install or update the Databricks CLI.
  • uv is required to run tests and to install dependencies for this project from an IDE.
  • The remote Databricks workspace must have workspace files enabled. See What are workspace files?.
  • An existing catalog. To create a catalog, see Create catalogs.

Step 1: Set up authentication

First, set up authentication between the Databricks CLI on your development machine and your Databricks workspace. This page assumes that you want to use OAuth user-to-machine (U2M) authentication and a corresponding Databricks configuration profile named DEFAULT for authentication.

note

U2M authentication is appropriate for trying out these steps in real time. For fully automated workflows, Databricks recommends that you use OAuth machine-to-machine (M2M) authentication instead. See the M2M authentication setup instructions in Authorize service principal access to Databricks with OAuth.

  1. Use the Databricks CLI to initiate OAuth token management locally by running the following command for each target workspace.

    In the following command, replace <workspace-url> with your Databricks workspace instance URL, for example https://dbc-a1b2345c-d6e7.cloud.databricks.com.

    Bash
    databricks auth login --host <workspace-url>
  2. The Databricks CLI prompts you to save the information that you entered as a Databricks configuration profile. Press Enter to accept the suggested profile name, or enter the name of a new or existing profile. Any existing profile with the same name is overwritten with the information that you entered. You can use profiles to quickly switch your authentication context across multiple workspaces.

    To get a list of any existing profiles, in a separate terminal or command prompt, use the Databricks CLI to run the command databricks auth profiles. To view a specific profile's existing settings, run the command databricks auth env --profile <profile-name>.

  3. In your web browser, complete the on-screen instructions to log in to your Databricks workspace.

  4. To view a profile's current OAuth token value and the token's upcoming expiration timestamp, run one of the following commands:

    • databricks auth token --host <workspace-url>
    • databricks auth token -p <profile-name>
    • databricks auth token --host <workspace-url> -p <profile-name>

    If you have multiple profiles with the same --host value, you might need to specify the --host and -p options together to help the Databricks CLI find the correct matching OAuth token information.

Step 2: Initialize the bundle

Initialize a bundle using the default Python bundle project template.

  1. Use your terminal or command prompt to switch to a directory on your local development machine that will contain the template's generated bundle.

  2. Use the Databricks CLI to run the bundle init command:

    Bash
    databricks bundle init
  3. For Template to use, leave the default value of default-python by pressing Enter.

  4. For Unique name for this project, leave the default value of my_project, or type a different value, and then press Enter. This determines the name of the root directory for this bundle. This root directory is created in your current working directory.

  5. For Include a job that runs a notebook, select yes and press Enter.

  6. For Include an ETL pipeline, select no and press Enter.

  7. For Include a stub (sample) Python package, select no and press Enter.

  8. For Use serverless, select yes and press Enter. This instructs the Databricks CLI to configure your bundle to run on serverless compute.

  9. For Default catalog for any tables created by this project [hive_metastore], enter the name of an existing Unity Catalog catalog.

  10. For Use a personal schema for each user working on this project., select yes.

Step 3: Explore the bundle

To view the files that the template generated, switch to the root directory of your newly created bundle. Files of particular interest include the following:

  • databricks.yml: This file specifies the bundle's programmatic name, includes references to the bundle's files, defines catalog and schema variables, and specifies settings for target workspaces.
  • resources/sample_job.job.yml: This file specifies the job's settings, including a default notebook task. For information about job settings, see job.
  • src/: This folder contains the job's source files.
  • src/sample_notebook.ipynb: This notebook reads a sample table.
  • tests/: This folder contains sample unit tests.
  • README.md: This file contains additional information about getting started and using this bundle template.
tip

You can define, combine, and override the settings for new job clusters in bundles by using the techniques described in Override with target settings.

Step 4: Validate the bundle configuration

Now check whether the bundle configuration is valid.

  1. From the root directory, use the Databricks CLI to run the bundle validate command:

    Bash
    databricks bundle validate
  2. If a summary of the bundle configuration is returned, then the validation succeeded. If any errors are returned, fix the errors, and then repeat this step.

Step 5: Deploy the bundle to the remote workspace

Next, deploy the job to your remote Databricks workspace and verify the job within your workspace.

  1. From the bundle root, use the Databricks CLI to run the bundle deploy command:

    Bash
    databricks bundle deploy --target dev
  2. Confirm that the notebook was deployed:

    1. In your Databricks workspace's sidebar, click Workspace.
    2. Click into the Users > <your-username> > .bundle > <project-name> > dev > files > src folder. The notebook should be in this folder.
  3. Check whether the job was created:

    1. In your Databricks workspace's sidebar, click Jobs & Pipelines.
    2. Optionally, select the Jobs and Owned by me filters.
    3. Click [dev <your-username>] sample_job.
    4. Click the Tasks tab. There should be one notebook_task.

If you make any changes to your bundle after this step, you should repeat steps 4-5 to check whether your bundle configuration is still valid and then redeploy the project.

Step 6: Run the deployed job

Now trigger a run of the job in your workspace from the command line.

  1. From the root directory, use the Databricks CLI to run the bundle run command:

    Bash
    databricks bundle run --target dev sample_job
  2. Copy the value of Run URL that appears in your terminal and paste this value into your web browser to open your Databricks workspace. See View and run a job created with Databricks Asset Bundles

  3. In your Databricks workspace, after the job task completes successfully and shows a green title bar, click the job task to see the results.

If you make any changes to your bundle after this step, you should repeat steps 4-6 to check whether your bundle configuration is still valid, redeploy the project, and run the redeployed project.

Step 7: Run tests

Finally, use pytest to run tests locally:

uv run pytest

Step 8: Clean up

In this step, you delete the deployed notebook and the job from your workspace.

  1. From the root directory, use the Databricks CLI to run the bundle destroy command:

    Bash
    databricks bundle destroy --target dev
  2. When prompted to permanently all workspace files and directories, type y and press Enter.

  3. If you also want to delete the bundle from your development machine, you can now delete the local project directory.