Databricks asset bundle templates

Preview

This feature is in Public Preview.

This article describes the syntax for Databricks asset bundle templates, which work with Databricks CLI versions 0.205 and above. Bundles enable programmatic management of Databricks workflows. See What are Databricks asset bundles?

Bundle templates enable users to create bundles in a consistent, repeatable way. Bundle templates also enable users to tailor bundles with custom values for their specific usage requirements.

A Databricks asset bundle consists of at minimum one (and only one) configuration file named databricks.yml. A bundle can also have any number of child folders, and files such as Python code files and Databricks notebooks.

Create a bundle based on a template

This section describes how to use an existing bundle template. If you do not already have access to a bundle template that you want to use, then you can create one by skipping ahead to Create a bundle template.

Use the default bundle template for Python

To use the Databricks default bundle template for Python to create a starter bundle that you can then customize further, use the Databricks CLI to run the bundle init command, and then answer the on-screen prompts:

databricks bundle init

To view the source code for this bundle template, see the default-python template in the databricks/cli repository in GitHub.

Use a non-default bundle template

To use a bundle template other than the Databricks default bundle template for Python, you must know the path on your local development machine or the URL to the remote bundle template location. Use the Databricks CLI to run the bundle init command as follows:

databricks bundle init <folder/path/to/project/template> \
--output-dir="</local/path/to/project/template/output>" \
--template-dir="</local/path/to/project/templates>" \
--config-file="</local/path/to/project/template/input/values>"

In the preceding command, replace the following placeholders:

  • <project-template-local-path-or-url> is required. Replace this placeholder with either the path to a local bundle template or the URL of a bundle template hosted with a third-party Git provider such as GitHub. To get the URL, see your bundle template provider’s documentation.

  • --output-dir is optional. Replace the placeholder </local/path/to/project/template/output> with the local path on your machine where you want to output the bundle template’s results. To output the results to the current working directory, specify a single dot (--output-dir="."). If --output-dir is not specified, then the bundle template’s results are output to the current working directory.

  • --template-dir is optional. If <project-template-local-path-or-url> is the URL of a bundle template hosted with a third-party Git provider such as GitHub, you can use this option to specify a directory that is relative to the root of the Git repository, with the specified directory containing the bundle template. Replace the placeholder <folder/path/to/project/template> with the relative path. For example the following command uses the default-python template in the libs/template/templates/default-python directory in the databricks/cli repository in GitHub (which happens to also be the bundle template that is used when you run databricks bundle init without any parameters):

    databricks bundle init https://github.com/databricks/cli --template-dir libs/template/templates/default-python
    

    If --template-dir is not specified, the Databricks CLI looks for the bundle template in the URL’s root.

  • --config-file is optional. If you specify --config-file, then replace the placeholder </local/path/to/project/template/input/values> with the local path on your machine to a JSON-formatted file that contains the input variable names and values for each of the bundle template’s input variables. If --config-file is not specified, the Databricks CLI will prompt you for the value of each of the bundle template’s input variables. The file’s format is as follows:

    {
      "<input-variable-1>": "<input-value-1>",
      "<input-variable-N>": "<input-value-N>"
    }
    

Create a bundle template

This section describes how to create a bundle template. If you already have a bundle template that you want to use, then skip back to Create a bundle based on a template. For examples, see the templates that are created and maintained by Databricks in the bundle examples repository in GitHub.

To create a basic bundle template for you others to use with the bundle init command, complete the following instructions. When you are finished here, you will have a basic bundle template that you can extend and customize for your own kinds of bundles. You can then share these bundle templates for others to use.

This basic bundle template creates a new folder with the given name in the given path, or uses an existing folder with the given name in the given path. This template then creates a file with the given contents within this given folder. If the file already exists in the given folder, the template stops.

To view the source code for the Databricks default bundle template for Python, which is more complex than this basic bundle template, see the default-python template in the databricks/cli repository in GitHub.

Bundle templates use Go package templating syntax. See the Go package template documentation.

To create a basic bundle template, do the following:

  1. Create or identify an empty folder on your development machine.

  2. In the folder’s root, create a file named databricks_template_schema.json. This file contains the variables that users provide input values for. Your template uses these user-provided input values to customize the bundle. This file follows the JSON Schema Specification.

  3. Add the following content to the databricks_template_schema.json file, and then save the file:

    {
      "properties": {
        "dir_name": {
          "type": "string",
          "default": "my_directory",
          "description": "Directory name",
          "order": 1
        },
        "file_name": {
          "type": "string",
          "default": "my_file",
          "description": "File name within this directory",
          "order": 2
        },
        "file_content": {
          "type": "string",
          "description": "File contents",
          "order": 3
        }
      }
    }
    

    In this file:

    • dir_name, file_name, and file_content are the programmatic names for these example input variables.

    • default is an optional default value for the related variable, if a value is not provided by the user with --config-file as part of the bundle init command, or overridden by the user at the command prompt.

    • description is the user prompt for the related input variable, if a value is not provided by the user with --config-file as part of the bundle init command.

    • order is an optional order in which each user prompt appears, if a value is not provided by the user with --config-file as part of the bundle init command. This enables you to display user prompts in a non-linear order. If order is not provided, then user prompts display in the order in which they are listed, from beginning to end.

  4. In the folder’s root, create a folder named template. This folder contains the template to be customized by using the user-provided input values.

  5. In the template folder, create a child folder named {{.dir_name}}. The {{.dir_name}} folder name will be replaced by the user-provided input value for the dir_name variable in the databricks_template_schema.json file. The {{ }} denotes an action, which in Go package template syntax is a data evaluation or control structure. The dot (as in .dir_name) denotes a cursor in Go package template syntax, which represents the current location in the structure (in this case, the properties array in the databricks_template_schema.json file).

  6. In the {{.dir_name}} child folder, create a file named {{.file_name}}.tmpl. The {{.file_name}}.tmpl filename will be replaced by the user-provided input value for the file_name variable in the databricks_template_schema.json file. Your bundle template structure should be as follows:

    |-- template
    |     `-- {{.dir_name}}
    |           `-- {{.file_name}}.tmpl`
    `-- databricks_template_schema.json
    
  7. Add the following content to the {{.file_name}}.tmpl file:

    {{.file_content}}
    

    In this file, {{.file_content}} will be replaced by the user-provided input value for the file_content variable.

    Note

    You must add .tmpl to the end of a filename (in this example, the {{.file_name}}.tmpl file) only if your bundle template intends to replace the content of the file. For example, if you want to provide a file named databricks.yml with static content, leave this filename unchanged. However, if you want to dynamically change this file’s contents, use the filename databricks.yml.tmpl instead.

  8. In the folder’s root, test the template by using Databricks CLI to run the following command:

    databricks bundle init . --output-dir="."
    

    In this command, the first dot instructs the Databricks CLI to use the databricks_template_schema.json file in the current working directory. The dot in the --output-dir option instructs the Databricks CLI to write the template’s output to the current working directory as well. The --output-dir option is optional. If --output-dir is not specified, then the bundle template’s results are output to the current working directory.

    Tip

    You can also specify the --template-dir option. If the bundle template is hosted with a third-party Git provider such as GitHub, you can use this option to specify a directory that is relative to the root of the Git repository, with the specified directory containing the bundle template.

    To share this template with others, you could add your template’s databricks_template_schema.json file and any and all related *.tmpl folders and files to any version control system that is compatible with Git.

    To use this template, you would then replace the first dot bundle init command with the path to that Git location. You could also replace the dot in the --output-dir option to the local path on your development machine where you want to write the template’s output, instead of writing the template’s output to the current working directory.

  9. For the first prompt, Directory name, override the default value of my_directory with my-dir and press Enter.

  10. For the second prompt, File name within this directory, override the default value of my_file with my-file.txt and press Enter.

  11. For the third prompt, File contents, enter Hello, World! and press Enter.

    The child folder my-dir is created. Within the my-dir child folder, a file named my-file.txt with the contents of Hello, World! is created.

  12. To test the template by using an input values file, in the folder’s root, create a file named input_values.json. Add the following content to this file and then save the file:

    {
      "dir_name": "my-dir-2",
      "file_name": "my-file-2.txt",
      "file_content": "Hello Again, World!"
    }
    
  13. In the folder’s root, use Databricks CLI to run the following command:

    databricks bundle init . --output-dir="." \
    --config-file="input_values.json"
    

    The child folder my-dir-2 is created. Within the my-dir-2 child folder, a file named my-file-2.txt with the contents of Hello Again, World! is created.

  14. To clean up this bundle template, you can now delete the following generated child folders and files that are not part of the original bundle template:

    |-- my-dir
    |     `-- my-file.txt
    |-- my-dir-2
    |     `-- my-file-2.txt
    `-- input_values.json
    
  15. If you want to share this bundle template with others, you can store it in version control with any provider that Git supports and that your users have access to. To run the bundle init command with a Git URL, make sure that the databricks_template_schema.json file is in the root location relative to that Git URL.

    Tip

    You can put the databricks_template_schema.json file in a different folder, relative to the bundle’s root. You can then use the bundle init command’s --template-dir option to reference that folder, which contains the databricks_template_schema.json file.

Next steps