Tutorial: Create your first custom Databricks Asset Bundle template
In this tutorial, you’ll create a custom Databricks Asset Bundle template and learn how to use it to automate more complex processing tasks.
Preview
This feature is in Public Preview.
The Databricks Asset Bundle workflow supports both manual and templated creation of bundles. Templated bundles come in two flavors: ones that use default bundle templates, and ones that use custom bundle templates.
The default bundle template assumes a very specific configuration for simplicity, while the custom bundle template allows you to specify:
Folder structures
Computation and build steps and tasks
Tests
Other behaviors configurable in a DevOps infrastructure-as-code (IaC) environment
For example, if you routinely run Databricks jobs that require custom packages with a time-consuming compilation step upon installation, you can speed up your development loop by creating a bundle template that supports custom container environments.
Bundle templates define a directory structure that mirrors your intended bundle’s structure. They include a databricks_template_schema.json
file that defines the necessary user-provided parameters for creating a new bundle. Let’s dive in and create a new template that builds custom container environments.
Before you start
If you haven’t, install the the Databricks CLI version 0.205 or above. If you’ve already installed it, confirm the version is 0.205 or higher by running
databricks -version
from a terminal.
Create a bundle template for running a container-based job
To make your first “container-job” bundle template, do the following from a terminal that can run Databricks CLI commands:
Create an empty directory named
dab-container-template
:mkdir dab-container-template
In the directory’s root, create a file named
databricks_template_schema.json
. This file contains the variables that must be provided by a user at bundle creation time:touch dab-container-template/databricks_template_schema.json
Add the following contents to the
databricks_template_schema.json
and save the file. Each variable will be translated to a user prompt during bundle creation using the Databricks CLI:{ "properties": { "project_name": { "type": "string", "default": "project_name", "description": "Project name", "order": 1 } } }
In the template directory, create subdirectories named
resources
andsrc
. Thetemplate
folder contains the directory structure for your generated bundles. The names of the subdirectories and files will follow Go package template syntax when derived from user values.mkdir -p "dab-container-template/template/resources" mkdir -p "dab-container-template/template/src"
In the
template
directory, create a file nameddatabricks.yml.tmpl
and add the following contents:# This is a Databricks asset bundle definition for {{.project_name}}. # See https://docs.databricks.com/dev-tools/bundles/index.html for documentation. bundle: name: {{.project_name}} include: - resources/*.yml targets: # The 'dev' target, used for development purposes. # Whenever a developer deploys using 'dev', they get their own copy. dev: # We use 'mode: development' to make sure everything deployed to this target gets a prefix # like '[dev my_user_name]'. Setting this mode also disables any schedules and # automatic triggers for jobs and enables the 'development' mode for Delta Live Tables pipelines. mode: development default: true workspace: host: {{workspace_host}} # The 'prod' target, used for production deployment. prod: # For production deployments, we only have a single copy, so we override the # workspace.root_path default of # /Users/${workspace.current_user.userName}/.bundle/${bundle.target}/${bundle.name} # to a path that is not specific to the current user. {{- /* Explaining 'mode: production' isn't as pressing as explaining 'mode: development'. As we already talked about the other mode above, users can just look at documentation or ask the assistant about 'mode: production'. # # By making use of 'mode: production' we enable strict checks # to make sure we have correctly configured this target. */}} mode: production workspace: host: {{workspace_host}} root_path: /Shared/.bundle/prod/${bundle.name} {{- if not is_service_principal}} run_as: # This runs as {{user_name}} in production. Alternatively, # a service principal could be used here using service_principal_name # (see Databricks documentation). user_name: {{user_name}} {{end -}}
Create another YAML file named
{{.project_name}}_job.yml.tmpl
and place it in yourresources
directory. This new YAML file enables you to split your project’s job definitions from the rest of your bundle’s definition. Add the following YAML code to this file to describe your project’s job and runtime:# The main job for {{.project_name}} resources: jobs: {{.project_name}}_job: name: {{.project_name}}_job tasks: - task_key: python_task job_cluster_key: job_cluster spark_python_task: python_file: ../src/{{.project_name}}/task.py job_clusters: - job_cluster_key: job_cluster new_cluster: docker_image: url: databricksruntime/python:10.4-LTS node_type_id: i3.xlarge spark_version: 13.3.x-scala2.12
This is where you include your custom container image. In this step, you’ve specified one of the default Databricks base images, but you can customize this base image by installing packages specific to your project.
Under your
src
directory, make a placeholder Python task file to run within your containerized environment:touch "src/{{.project_name}}/task.py"
Now, add the following placeholder code to that file:
import pyspark from pyspark.sql import SparkSession spark = SparkSession.builder.master('local[*]').appName('example').getOrCreate() print(f'Spark version{spark.version}')
Review the structure of your bundle template. It should be as follows:
. ├── databricks_template_schema.json └── template ├── databricks.yml.tmpl ├── resources │ └── {{.project_name}}_job.yml.tmpl └── src └── {{.project_name}} └── task.py
And your first custom bundle template is complete! To generate a bundle based off your new custom template, you can use the same databricks bundle init
command you used before but with added parameters to specify our template’s location, like this:
databricks bundle init dab-container-job
With this new bundle template you can create bundles for running containerized workflows. This is useful for jobs that require a significant amount of time during cluster installation compared to job run time.
Next steps
Create a bundle that deploys a notebook to a Databricks workspace and then runs that deployed notebook as a Databricks job. See Develop a job on Databricks by using Databricks Asset Bundles.
Create a bundle that deploys a notebook to a Databricks workspace and then runs that deployed notebook as a Delta Live Tables pipeline. See Develop a Delta Live Tables pipeline by using Databricks Asset Bundles.
Create a bundle that deploys and runs an MLOps Stack. See Databricks Asset Bundles for MLOps Stacks.
Add a bundle to a CI/CD (continuous integration/continuous deployment) workflow in GitHub. See Run a CI/CD workflow with a Databricks Asset Bundle and GitHub Actions.