What are Databricks asset bundles?


This feature is in Public Preview.

Databricks asset bundles make it possible to express complete data, analytics, and ML projects as a collection of source files called a bundle. A bundle’s source files serve as an end-to-end definition of a project. These source files include information about how they are to be tested and deployed. This end-to-end definition makes it simple to apply data engineering best practices such as source control, code review, testing, and CI/CD.

A bundle includes the following parts:

  • Source files, such as notebooks and Python files, include the business logic.

  • Declarations and settings for Databricks resources, such as Databricks jobs, Delta Live Tables pipelines, Model Serving endpoints, MLflow Experiments, and MLflow registered models.

  • Unit tests and integration tests.

  • Configurations that define to which workspace or workspaces the bundle is to be deployed.

What are the benefits of using bundles?

Bundles make it possible to describe end-to-end data, analytics, and ML projects as source code. The related source code format is fully supported by Databricks and enables interoperability with a broad range of ML, data, and software engineering tools and processes. Key benefits of using bundles include:

  • Best-practice tools and processes for working with source code:

    • Source control and history: Better troubleshooting, governance, and disaster recovery through added control and detailed logs.

    • Code review: Peer review of code changes, which fosters knowledge sharing and improves code quality.

    • Testing: Code reliability through systematic identification and fixing bugs and issues.

    • CI/CD: Streamline and automate the code integration and deployment process, which promotes a more efficient development cycle.

    • MLOps: Support for MLOps best practices through MLOps Stacks.

  • Streamlined local development:

    • IDEs and local development: bundles are typically used with a local IDE. Bundles work well in the Databricks extension for Visual Studio Code.

    • Iteratively develop by using a personal copy of a bundle without affecting collaborators.

    • Run resources such as jobs or pipelines before they are deployed to production.

  • Automation:

    • Eliminate manual deployment and validation processes. These processes can be labor-intensive and error-prone. A code-based approach makes it easier to get more done in less time.

    • Configuration management for deployments across multiple workspaces, regions, and clouds.

    • Ensure consistent repeatability during frequent redeployments and reruns of code.

When should I use bundles?

Some ideal scenarios for bundles include:

  • Develop data, analytics, and ML projects in a team-based environment. Bundles can help you organize and manage various source files efficiently. This ensures smooth collaboration and streamlined processes.

  • Iterate on ML problems faster. Manage ML pipeline resources (such as training and batch inference jobs) by using ML projects that follow production best practices from the beginning.

  • Set organizational standards for new projects by authoring custom bundle templates that include default permissions, service principals, and CI/CD configurations.

  • Regulatory compliance: In industries where regulatory compliance is a significant concern, bundles can help maintain a versioned history of code and infrastructure work. This assists in governance and ensures that necessary compliance standards are met.

  • Disaster recovery planning: When you plan for disaster recovery, bundles can be a vital tool for a robust strategy. This is because bundles enable the tracking of all changes, which facilitates smoother recovery processes in case of unplanned incidents.

How do I work with a bundle?

You typically create a bundle on a local development machine with an IDE and the Databricks CLI version 0.205 or above. These tools enable you to create, validate, deploy, and run a bundle. See Databricks asset bundles development work tasks.

You can edit a bundle in a Databricks workspace after you add the bundle to Git by using the Git integration with Databricks Repos. However, you cannot test or deploy a bundle from a workspace. Instead, you can use a local IDE for testing and CI/CD for deployment.

Databricks provides a default bundle template to help you get started. Organizations can create custom bundle templates to define their own standards. These standards might include default permissions, service principals, and CI/CD configuration. See Databricks asset bundle templates.

Next steps