Databricks asset bundles development work tasks

Preview

This feature is in Public Preview.

This article describes the sequence of work tasks for Databricks asset bundle development. See What are Databricks asset bundles?

To create, validate, deploy, and run bundles, complete the following steps.

Step 1: Create a bundle

There are three ways to begin creating a bundle:

  1. Use the default bundle template for Python.

  2. Use a non-default bundle template.

  3. Create a bundle manually.

Use the default bundle template for Python

To use the Databricks default bundle template for Python to create a starter bundle that you can then customize further, use Databricks CLI version 0.205 or above to run the bundle init command, and then answer the on-screen prompts:

databricks bundle init

To view the source code for this bundle template, see the default-python template in the databricks/cli repository in GitHub.

Skip ahead to Step 2: Fill in the bundle configuration files.

Use a non-default bundle template

To use a bundle template other than the Databricks default bundle template for Python, you must know the path on your local development machine or the URL to the remote bundle template location. Use Databricks CLI version 0.205 or above to run the bundle init command as follows:

databricks bundle init <project-template-local-path-or-url>

For more information about this command, see Databricks asset bundle templates. For information about a specific bundle template, see the bundle template provider’s documentation.

Skip ahead to Step 2: Fill in the bundle configuration files.

Create a bundle manually

To create a bundle manually instead of by using a bundle template, begin with an empty directory on your development machine, or an empty repository with a third-party Git provider. These approach assumes that you are using a local development machine, such as a physical laptop or desktop computer.

In your empty directory or empty repository, create one or more bundle configuration files as input. These files are expressed in YAML format. There must be at minimum one (and only one) bundle configuration file named databricks.yml. If there are multiple bundle configuration files, they must be referenced by the databricks.yml file.

To more easily and quickly create YAML files that conform to the Databricks asset bundle configuration syntax, you can use a tool such as Visual Studio Code, PyCharm Professional, or IntelliJ IDEA Ultimate that provide support for YAML files and JSON schema files, as follows:

  1. Add YAML language server support to Visual Studio Code, for example by installing the YAML extension from the Visual Studio Code Marketplace.

  2. Generate the Databricks asset bundle configuration JSON schema file by using Databricks CLI version 0.205 or above to run the bundle schema command and redirect the output to a JSON file. For example, generate a file named bundle_config_schema.json within the current directory, as follows:

    databricks bundle schema > bundle_config_schema.json
    
  3. Use Visual Studio Code to create or open a bundle configuration file within the current directory. This file must be named databricks.yml.

  4. Add the following comment to the beginning of your bundle configuration file:

    # yaml-language-server: $schema=bundle_config_schema.json
    

    Note

    In the preceding comment, if your Databricks asset bundle configuration JSON schema file is in a different path, replace bundle_config_schema.json with the full path to your schema file.

  5. Use the YAML language server features that you added earlier. For more information, see your YAML language server’s documentation.

  1. Generate the Databricks asset bundle configuration JSON schema file by using Databricks CLI version 0.205 or above to run the bundle schema command and redirect the output to a JSON file. For example, generate a file named bundle_config_schema.json within the current directory, as follows:

    databricks bundle schema > bundle_config_schema.json
    
  2. Configure PyCharm to recognize the bundle configuration JSON schema file, and then complete the JSON schema mapping, by following the instructions in Configure a custom JSON schema.

  3. Use PyCharm to create or open a bundle configuration file. This file must be named databricks.yml. As you type, PyCharm checks for JSON schema syntax and formatting and provides code completion hints.

  1. Generate the Databricks asset bundle configuration JSON schema file by using Databricks CLI version 0.205 or above to run the bundle schema command and redirect the output to a JSON file. For example, generate a file named bundle_config_schema.json within the current directory, as follows:

    databricks bundle schema > bundle_config_schema.json
    
  2. Configure IntelliJ IDEA to recognize the bundle configuration JSON schema file, and then complete the JSON schema mapping, by following the instructions in Configure a custom JSON schema.

  3. Use IntelliJ IDEA to create or open a bundle configuration file. This file must be named databricks.yml. As you type, IntelliJ IDEA checks for JSON schema syntax and formatting and provides code completion hints.

Step 2: Fill in the bundle configuration files

Bundle configuration files declaratively model your Databricks workflows by specifying settings such as workspace details, artifact names, location names, job details, and pipeline details.

To learn how to fill in your bundle configuration files, see Databricks asset bundle configurations.

Step 3: Validate the bundle configuration files

Before you deploy artifacts or run a job or pipeline, you should make sure that your bundle configuration files are syntactically correct. To do this, run the bundle validate command from the same directory as the bundle configuration file. This directory is also known as the bundle root.

databricks bundle validate

Step 4: Deploy the bundle

Before you deploy the bundle, make sure that the remote workspace has workspace files enabled. See What are workspace files?.

To deploy any specified local artifacts to the remote workspace, run the bundle deploy command from the bundle root. If no command options are specified, the Databricks CLI uses the default target as declared within the bundle configuration files:

databricks bundle deploy

Tip

You can run databricks bundle commands outside of the bundle root. If so, you can specify the bundle root path by setting the BUNDLE_ROOT environment variable. If this environment variable is not set, databricks bundle commands attempt to find the bundle root by searching within the current working directory.

To deploy the artifacts within the context of a specific target, specify the -t (or --target) option along with the target’s name as declared within the bundle configuration files. For example, for a target declared with the name dev:

databricks bundle deploy -t dev

Step 5: Run the bundle

To run a specific job or pipeline, run the bundle run command from the bundle root. You must specify the job or pipeline declared within the bundle configuration files. If the -t option is not specified, the default target as declared within the bundle configuration files is used. For example, to run a job named hello_job within the context of the default target:

databricks bundle run hello_job

To run a job named hello_job within the context of a target declared with the name dev:

databricks bundle run -t dev hello_job

Step 6: Destroy the bundle

If you want to delete jobs, pipelines, and artifacts that were previously deployed, run the bundle destroy command from the bundle root. This command deletes all previously-deployed jobs, pipelines, and artifacts that are defined in the bundle configuration files:

databricks bundle destroy

By default, you are prompted to confirm permanent deletion of the previously-deployed jobs, pipelines, and artifacts. To skip these prompts and perform automatic permanent deletion, add the --auto-approve option to the bundle destroy command.