What are Databricks Asset Bundles?

Databricks Asset Bundles are a tool to facilitate the adoption of software engineering best practices, including source control, code review, testing, and continuous integration and delivery (CI/CD), for your data and AI projects. Bundles make it possible to describe Databricks resources such as jobs, pipelines, and notebooks as source files. These source files provide an end-to-end definition of a project, including how it should be structured, tested, and deployed, which makes it easier to collaborate on projects during active development.

Bundles provide a way to include metadata alongside your project’s source files. When you deploy a project using bundles, this metadata is used to provision infrastructure and other resources. Your project’s collection of source files and metadata is then deployed as a single bundle to your target environment. A bundle includes the following parts:

  • Required cloud infrastructure and workspace configurations

  • Source files, such as notebooks and Python files, that include the business logic

  • Definitions and settings for Databricks resources, such as Databricks jobs, Delta Live Tables pipelines, Model Serving endpoints, MLflow Experiments, and MLflow registered models

  • Unit tests and integration tests

The following diagram provides a high-level view of a development and CI/CD pipeline with bundles:

Databricks Asset Bundles overview

When should I use Databricks Asset Bundles?

Databricks Assets Bundles are an infrastructure-as-code (IaC) approach to managing your Databricks projects. Use them when you want to manage complex projects where multiple contributors and automation are essential, and continuous integration and deployment (CI/CD) are a requirement. Since bundles are defined and managed through YAML templates and files you create and maintain alongside source code, they map well to scenarios where IaC is an appropriate approach.

Some ideal scenarios for bundles include:

  • Develop data, analytics, and ML projects in a team-based environment. Bundles can help you organize and manage various source files efficiently. This ensures smooth collaboration and streamlined processes.

  • Iterate on ML problems faster. Manage ML pipeline resources (such as training and batch inference jobs) by using ML projects that follow production best practices from the beginning.

  • Set organizational standards for new projects by authoring custom bundle templates that include default permissions, service principals, and CI/CD configurations.

  • Regulatory compliance: In industries where regulatory compliance is a significant concern, bundles can help maintain a versioned history of code and infrastructure work. This assists in governance and ensures that necessary compliance standards are met.

How do Databricks Asset Bundles work?

Bundle metadata is defined using YAML files that specify the artifacts, resources, and configuration of a Databricks project. You can create this YAML file manually or generate one using a bundle template. The Databricks CLI can then be used to validate, deploy, and run bundles using these bundle YAML files. You can run bundle projects from IDEs, terminals, or within Databricks directly. This article uses the Databricks CLI.

Bundles can be created manually or based on a template. The Databricks CLI provides default templates for simple use cases, but for more specific or complex jobs, you can create custom bundle templates to implement your team’s best practices and keep common configurations consistent.

For more details on the configuration YAML used to express Databricks Asset Bundles, see Databricks Asset Bundle configurations.

Configure your environment to use bundles

Use the Databricks CLI to easily deploy bundles from the command line. You can check if the Databricks CLI is installed and the current version you’re using by running the following command:

databricks --version

Note

Databricks CLI version 0.218.0 or higher is required. To install the Databricks CLI, see Install or update the Databricks CLI.

After installing the Databricks CLI, verify that your remote Databricks workspaces are configured correctly. Bundles require the workspace files feature to be enabled as this feature supports working with files other than Databricks Notebooks, such as .py and .yml files. If you’re using Databricks Runtime version 11.2 (or later) this feature should be enabled by default.

Authentication

Databricks provides several authentication methods. Databricks recommends that you use one the following methods to authenticate:

  • For attended authentication scenarios, such as manual workflows where you use your web browser to log in to your target Databricks workspace (when prompted by the Databricks CLI), use OAuth user-to-machine (U2M) authentication. This method is ideal for experimenting with the getting started tutorials for Databricks Asset Bundles or for the rapid development of bundles.

  • For unattended authentication scenarios, such as fully automated workflows in which there is no opportunity for you to use your web browser to log in to your target Databricks workspace at that time, use OAuth machine-to-machine (M2M) authentication. This method requires the use of Databricks service principals and is ideal for using Databricks Asset Bundles with CI/CD systems such as GitHub.

For OAuth U2M authentication, do the following:

  1. Use the Databricks CLI to initiate OAuth token management locally by running the following command for each target workspace.

    In the following command, replace <workspace-url> with your Databricks workspace instance URL, for example https://dbc-a1b2345c-d6e7.cloud.databricks.com.

    databricks auth login --host <workspace-url>
    
  2. The Databricks CLI prompts you to save the information that you entered as a Databricks configuration profile. Press Enter to accept the suggested profile name, or enter the name of a new or existing profile. Any existing profile with the same name is overwritten with the information that you entered. You can use profiles to quickly switch your authentication context across multiple workspaces.

    To get a list of any existing profiles, in a separate terminal or command prompt, use the Databricks CLI to run the command databricks auth profiles. To view a specific profile’s existing settings, run the command databricks auth env --profile <profile-name>.

  3. In your web browser, complete the on-screen instructions to log in to your Databricks workspace.

  4. To view a profile’s current OAuth token value and the token’s upcoming expiration timestamp, run one of the following commands:

    • databricks auth token --host <workspace-url>

    • databricks auth token -p <profile-name>

    • databricks auth token --host <workspace-url> -p <profile-name>

    If you have multiple profiles with the same --host value, you might need to specify the --host and -p options together to help the Databricks CLI find the correct matching OAuth token information.

You can use this configuration profile’s name in one or more of the following ways whenever you validate, deploy, run, or destroy bundles:

  • With the command-line option -p <profile-name>, appended to the commands databricks bundle validate, databricks bundle deploy, databricks bundle run, or databricks bundle destroy. See Databricks Asset Bundles development.

  • As the value of the profile mapping in the bundle configuration file’s top-level workspace mapping (although Databricks recommends that you use the host mapping set to the Databricks workspace’s URL instead of the profile mapping, as it makes your bundle configuration files more portable). See coverage of the profile mapping in workspace.

  • If the configuration profile’s name is DEFAULT, it is used by default when the command-line option -p <profile-name> or the profile (or host) mapping is not specified.

For OAuth M2M authentication, do the following:

  1. Complete the OAuth M2M authentication setup instructions. See Use a service principal to authenticate with Databricks (OAuth M2M).

  2. Install the Databricks CLI on the target compute resource in one of the following ways:

    • To manually install the Databricks CLI on the compute resource in real time, see Install or update the Databricks CLI.

    • To use GitHub Actions to automatically install the Databricks CLI on a GitHub virtual machine, see setup-cli in GitHub.

    • To use other CI/CD systems to automatically install the Databricks CLI on a virtual machine, see see your CI/CD system provider’s documentation and Install or update the Databricks CLI.

  3. Set the following environment variables on the compute resource as follows:

    • DATABRICKS_HOST, set to the Databricks workspace instance URL, for example https://dbc-a1b2345c-d6e7.cloud.databricks.com.

    • DATABRICKS_CLIENT_ID, set to the Databricks service principal’s Application ID value.

    • DATABRICKS_CLIENT_SECRET, set to the Databricks service principal’s OAuth Secret value.

    To set these environment variables, see the documentation for your target compute resource’s operating system or CI/CD system.

Develop your first Databricks Asset Bundle

The fastest way to start bundle development is by using a template. Create your first bundle project using the Databricks CLI bundle init command without any options. This presents a choice of Databricks-provided default bundle templates and asks a series of questions to initialize project variables.

databricks bundle init

Organizations can also create custom bundle templates to define their own standards. These standards might include default permissions, service principals, and custom CI/CD configuration. See Databricks Asset Bundle templates.

After you initialize your project, use the bundle validate command to validate your bundle before deploying it to your workspaces.

databricks bundle validate

You typically create a bundle on a local development machine with an IDE and the Databricks CLI version 0.218.0 or above. These tools enable you to create, validate, deploy, and run a bundle. See Databricks Asset Bundles development.

You can edit a bundle in a Databricks workspace after you add the bundle to Git by using the Databricks Git folder integration. However, you cannot test or deploy a bundle from a workspace. Instead, you can use your local IDE for testing and CI/CD for deployment.

Next steps

Common tasks

Use the following articles to complete common tasks for Databricks Asset Bundles.

Article

Use this article when you want to…

Databricks Asset Bundles development

Learn about creating, validating, deploying, and running a bundle by authoring a databricks.yml file and by using the Databricks CLI to run the commands databricks bundle validate, databricks bundle deploy, and databricks bundle run.

Databricks Asset Bundle configurations

Create a bundle’s databricks.yml file and other related bundle configuration files that conform to the YAML syntax for bundle configurations.

Substitutions and variables in Databricks Asset Bundles

Use substitutions and define custom variables to make your bundle configuration more modular and reusable.

Authentication for Databricks Asset Bundles

Set up a bundle project for Databricks authentication.

Develop a job on Databricks by using Databricks Asset Bundles

Create, deploy, and run a bundle for a Databricks job.

Develop Delta Live Tables pipelines with Databricks Asset Bundles

Create, deploy, and run a bundle for a Delta Live Tables pipeline.

Databricks Asset Bundles for MLOps Stacks

Create, deploy, and run a bundle for an MLOps Stack.

Databricks Asset Bundles library dependencies

Install libraries that a bundle needs to run on any related Databricks clusters.

Databricks Asset Bundle deployment modes

Use bundle deployment modes such as development and production to automatically enable or disable common deployment behaviors such as pausing or unpausing related schedules and triggers.

Databricks Asset Bundle templates

Use a template to make creating specific kinds of bundles faster, easier, and with more consistent and repeatable results.

Set permissions for resources in Databricks Asset Bundles

Apply granular access permissions levels to users, groups, and service principals for specific bundle resources.

Define artifact settings dynamically in Databricks Asset Bundles

Combine or override specific settings for artifacts in a bundle.

Run a CI/CD workflow with a Databricks Asset Bundle and GitHub Actions

Deploy or run a bundle in response to a specific GitHub workflow event such as a pull request or merge.

Override cluster settings in Databricks Asset Bundles

Combine or override specific settings for clusters in a bundle.

Add tasks to jobs in Databricks Asset Bundles

Add a task to a job in a bundle.

Override job tasks settings in Databricks Asset Bundles

Combine or override specific settings for job tasks in a bundle.

Develop a Python wheel file using Databricks Asset Bundles

Build, deploy, and call Python wheel files in a bundle.