bundle command group

Note

This information applies to Databricks CLI versions 0.205 and above, which are in Public Preview. To find your version of the Databricks CLI, run databricks -v.

The bundle command group within the Databricks CLI enables you to programmatically validate, deploy, and run Databricks workflows such as Databricks jobs, Delta Live Tables pipelines, and MLOps Stacks. See What are Databricks Asset Bundles?.

Important

Before you use the Databricks CLI, be sure to set up the Databricks CLI and set up authentication for the Databricks CLI.

You run bundle commands by appending them to databricks bundle. To display help for the bundle command, run databricks bundle -h.

Create a bundle from a project template

To create a Databricks Asset Bundle by using the default Databricks Asset Bundle template for Python, run the bundle init command as follows, and then answer the on-screen prompts:

databricks bundle init

To create a Databricks Asset Bundle by using a non-default Databricks Asset Bundle template, run the bundle init command as follows:

databricks bundle init <project-template-local-path-or-url> \
--project-dir="</local/path/to/project/template/output>"

See also:

Display the bundle configuration schema

To display the Databricks Asset Bundle configuration schema, run the bundle schema command, as follows:

databricks bundle schema

To output the Databricks Asset Bundle configuration schema as a JSON file, run the bundle schema command and redirect the output to a JSON file. For example, you can generate a file named bundle_config_schema.json within the current directory, as follows:

databricks bundle schema > bundle_config_schema.json

Validate a bundle

To validate that your bundle configuration files are syntactically correct, run the bundle validate command from the bundle project root, as follows:

databricks bundle validate

Sync a bundle’s tree to a workspace

Use the bundle sync command to do one-way synchronization of a bundle’s file changes within a local filesystem directory, to a directory within a remote Databricks workspace.

Note

bundle sync commands cannot synchronize file changes from a directory within a remote Databricks workspace, back to a directory within a local filesystem.

databricks bundle sync commands work in the same way as databricks bundle commands and are provided as a productivity convenience. For command usage information, see sync command group.

Generate a bundle configuration file

You can use the bundle generate command to generate resource configuration for a job or pipeline that already exists in your Databricks workspace. This command generates a *.yml file for the job or pipeline in the resources folder of the bundle project and also downloads any notebooks referenced in the job or pipeline configuration. Currently, only jobs with notebook tasks are supported by this command.

Important

The bundle generate command is provided as a convenience to autogenerate resource configuration. However, when this configuration is included in the bundle and deployed, it creates a new resource and does not update the existing resource unless bundle deployment bind has first been used on the resource.

Run the bundle generate command as follows:

databricks bundle generate [job|pipeline] --existing-[job|pipeline]-id [job-id|pipeline-id]

For example, the following command generates a new hello_job.yml file in the resources bundle project folder containing the YAML below, and downloads the simple_notebook.py to the src project folder.

databricks bundle generate job --existing-job-id 6565621249
# This is the contents of the resulting hello_job.yml file.
resources:
  jobs:
    6565621249:
      name: Hello Job
      format: MULTI_TASK
      tasks:
        - task_key: run_notebook
          existing_cluster_id: 0704-xxxxxx-yyyyyyy
          notebook_task:
            notebook_path: ./src/simple_notebook.py
            source: WORKSPACE
          run_if: ALL_SUCCESS
      max_concurrent_runs: 1

Bind bundle resources

The bundle deployment bind command allows you to link bundle-defined jobs and pipelines to existing jobs and pipelines in the Databricks workspace so that they become managed by Databricks Asset Bundles. If you bind a resource, existing Databricks resources in the workspace are updated based on the configuration defined in the bundle it is bound to after the next bundle deploy.

Tip

It’s a good idea to confirm the bundle workspace before running bind.

databricks bundle deployment bind [resource-key] [resource-id]

For example, the following command binds the resource hello_job to its remote counterpart in the workspace. The command outputs a diff and allows you to deny the resource binding, but if confirmed, any updates to the job definition in the bundle are applied to the corresponding remote job when the bundle is next deployed.

databricks bundle deployment bind hello_job 6565621249

Use bundle deployment unbind if you want to remove the link between the job or pipeline in a bundle and its remote counterpart in a workspace.

databricks bundle deployment unbind [resource-key]

Deploy a bundle

To deploy any specified local artifacts to the remote workspace, run the bundle deploy command from the bundle project root. If no command options are specified, the default environment as declared within the bundle configuration files is used, as follows:

databricks bundle deploy

Tip

You can run databricks bundle commands outside of the bundle root. If so, you can specify the bundle root path by setting the BUNDLE_ROOT environment variable. If this environment variable is not set, databricks bundle commands attempt to find the bundle root by searching within the current working directory.

To deploy the artifacts within the context of a specific environment, specify the -e (or --environment) option along with the environment’s name as declared within the bundle configuration files. For example, for an environment declared with the name development, run the following command:

databricks bundle deploy -e development

Run a bundle

To run a specific job or pipeline, use the bundle run command. You must specify the resource key of the job or pipeline declared within the bundle configuration files. By default, the environment declared within the bundle configuration files is used. For example, to run a job hello_job in the default environment, run the following command:

databricks bundle run hello_job

To specify the environment in which to run a job, use the -e option. The following example runs hello_job in the development environment:

databricks bundle run -e development hello_job

If you want to do a pipeline validation run, use the --validate-only option, as shown in the following example:

databricks bundle run --validate-only my_pipeline

To pass job parameters, use the --params option, followed by comma-separated key-value pairs, where the key is the parameter name. For example, the following command sets the parameter with the name message to HelloWorld for the job hello_job:

databricks bundle run --params message=HelloWorld hello_job

Note

You can pass parameters to job tasks using the job task options, but the --params option is the recommended method for passing job parameters. An error occurs if job parameters are specified for a job that doesn’t have job parameters defined or if task parameters are specified for a job that has job parameters defined.

To cancel and restart an existing job run or pipeline update, use the --restart option:

databricks bundle run --restart hello_job

Destroy a bundle

To delete jobs, pipelines, and artifacts that were previously deployed, run the bundle destroy command. The following command deletes all previously-deployed jobs, pipelines, and artifacts that are defined in the bundle configuration files:

databricks bundle destroy

By default, you are prompted to confirm permanent deletion of the previously-deployed jobs, pipelines, and artifacts. To skip these prompts and perform automatic permanent deletion, add the --auto-approve option to the bundle destroy command.