What are Databricks asset bundles?
Preview
This feature is in Public Preview.
Databricks asset bundles make it possible to express complete data, analytics, and ML projects as a collection of source files called a bundle. A bundle’s source files serve as an end-to-end definition of a project. These source files include information about how they are to be tested and deployed. This end-to-end definition makes it simple to apply data engineering best practices such as source control, code review, testing, and CI/CD.
A bundle includes the following parts:
Source files, such as notebooks and Python files, include the business logic.
Declarations and settings for Databricks resources, such as Databricks jobs, Delta Live Tables pipelines, Model Serving endpoints, MLflow Experiments, and MLflow registered models.
Unit tests and integration tests.
Configurations that define to which workspace or workspaces the bundle is to be deployed.
What are the benefits of using bundles?
Bundles make it possible to describe end-to-end data, analytics, and ML projects as source code. The related source code format is fully supported by Databricks and enables interoperability with a broad range of ML, data, and software engineering tools and processes. Key benefits of using bundles include:
Best-practice tools and processes for working with source code:
Source control and history: Better troubleshooting, governance, and disaster recovery through added control and detailed logs.
Code review: Peer review of code changes, which fosters knowledge sharing and improves code quality.
Testing: Code reliability through systematic identification and fixing bugs and issues.
CI/CD: Streamline and automate the code integration and deployment process, which promotes a more efficient development cycle.
MLOps: Support for MLOps best practices through MLOps Stacks.
Streamlined local development:
IDEs and local development: bundles are typically used with a local IDE. Bundles work well in the Databricks extension for Visual Studio Code.
Iteratively develop by using a personal copy of a bundle without affecting collaborators.
Run resources such as jobs or pipelines before they are deployed to production.
Automation:
Eliminate manual deployment and validation processes. These processes can be labor-intensive and error-prone. A code-based approach makes it easier to get more done in less time.
Configuration management for deployments across multiple workspaces, regions, and clouds.
Ensure consistent repeatability during frequent redeployments and reruns of code.
When should I use bundles?
Some ideal scenarios for bundles include:
Develop data, analytics, and ML projects in a team-based environment. Bundles can help you organize and manage various source files efficiently. This ensures smooth collaboration and streamlined processes.
Iterate on ML problems faster. Manage ML pipeline resources (such as training and batch inference jobs) by using ML projects that follow production best practices from the beginning.
Set organizational standards for new projects by authoring custom bundle templates that include default permissions, service principals, and CI/CD configurations.
Regulatory compliance: In industries where regulatory compliance is a significant concern, bundles can help maintain a versioned history of code and infrastructure work. This assists in governance and ensures that necessary compliance standards are met.
Disaster recovery planning: When you plan for disaster recovery, bundles can be a vital tool for a robust strategy. This is because bundles enable the tracking of all changes, which facilitates smoother recovery processes in case of unplanned incidents.
How do I work with a bundle?
You typically create a bundle on a local development machine with an IDE and the Databricks CLI version 0.205 or above. These tools enable you to create, validate, deploy, and run a bundle. See Databricks asset bundles development work tasks.
You can edit a bundle in a Databricks workspace after you add the bundle to Git by using the Git integration with Databricks Repos. However, you cannot test or deploy a bundle from a workspace. Instead, you can use a local IDE for testing and CI/CD for deployment.
Databricks provides a default bundle template to help you get started. Organizations can create custom bundle templates to define their own standards. These standards might include default permissions, service principals, and CI/CD configuration. See Databricks asset bundle templates.
Next steps
Create a bundle that deploys a notebook to a Databricks workspace and then runs that deployed notebook as a Databricks job. See Automate a Databricks job with Databricks asset bundles.
Create a bundle that deploys a notebook to a Databricks workspace and then runs that deployed notebook as a Delta Live Tables pipeline. See Automate a Delta Live Tables pipeline with Databricks asset bundles.
Create a bundle that deploys and runs an MLOps Stack. See Databricks asset bundles for MLOps Stacks.
Add a bundle to a CI/CD (continuous integration/continuous deployment) workflow in GitHub. See Run a CI/CD workflow with a Databricks asset bundle and GitHub Actions.
Create a bundle that builds, deploys, and run a Python wheel. See Databricks asset bundles for Python wheels.
Create a bundle based on a template, or create a template that you and others can use to create a bundle. See Databricks asset bundle templates.