Stack CLI

Beta

This feature is in Beta.

Note

The stack CLI requires Databricks CLI 0.8.3 or above.

The stack CLI provides a way to manage a stack of Databricks resources, such as jobs, notebooks, and DBFS files. You can store notebooks and DBFS files locally and create a stack configuration JSON template that defines mappings from your local files to paths in your Databricks workspace, along with configurations of jobs that run the notebooks.

Use the stack CLI with the stack configuration JSON template to deploy and manage your stack.

You run Databricks stack CLI subcommands by appending them to databricks stack.

databricks stack --help
Usage: databricks stack [OPTIONS] COMMAND [ARGS]...

  [Beta] Utility to deploy and download Databricks resource stacks.

Options:
  -v, --version   [VERSION]
  --debug         Debug Mode. Shows full stack trace on error.
  --profile TEXT  CLI connection profile to use. The default profile is
                  "DEFAULT".
  -h, --help      Show this message and exit.

Commands:
  deploy    Deploy a stack of resources given a JSON configuration of the stack
    Usage: databricks stack deploy [OPTIONS] CONFIG_PATH
    Options:
       -o, --overwrite  Include to overwrite existing workspace notebooks and DBFS
                        files  [default: False]
  download  Download workspace notebooks of a stack to the local filesystem
            given a JSON stack configuration template.
    Usage: databricks stack download [OPTIONS] CONFIG_PATH
    Options:
       -o, --overwrite  Include to overwrite existing workspace notebooks in the
                        local filesystem   [default: False]

Deploy a stack to a workspace

This subcommand deploys a stack. See Stack setup to learn how to set up a stack.

databricks stack deploy ./config.json

Stack configuration JSON template gives an example of config.json.

Download stack notebook changes

This subcommand downloads the notebooks of a stack.

databricks stack download ./config.json

Examples

Stack setup

File structure of an example stack

tree
.
├── notebooks
|   ├── common
|   |   └── notebook.scala
|   └── config
|       ├── environment.scala
|       └── setup.sql
├── lib
|   └── library.jar
└── config.json

This example stack contains a main notebook in notebooks/common/notebook.scala along with configuration notebooks in the notebooks/config folder. There is a JAR library dependency of the stack in lib/library.jar. config.json is the stack configuration JSON template of the stack. This is what is passed into the stack CLI for deployment of the stack.

Stack configuration JSON template

The stack configuration template describes the stack configuration.

cat config.json
{
  "name": "example-stack",
  "resources": [
    {
      "id": "example-workspace-notebook",
      "service": "workspace",
      "properties": {
        "source_path": "notebooks/common/notebook.scala",
        "path": "/Users/example@example.com/dev/notebook",
        "object_type": "NOTEBOOK"
      }
    },
    {
      "id": "example-workspace-config-dir",
      "service": "workspace",
      "properties": {
        "source_path": "notebooks/config",
        "path": "/Users/example@example.com/dev/config",
        "object_type": "DIRECTORY"
      }
    },
    {
      "id": "example-dbfs-library",
      "service": "dbfs",
      "properties": {
        "source_path": "lib/library.jar",
        "path": "dbfs:/tmp/lib/library.jar",
        "is_dir": false
      }
    },
    {
      "id": "example-job",
      "service": "jobs",
      "properties": {
        "name": "Example Stack CLI Job",
        "new_cluster": {
          "spark_version": "7.3.x-scala2.12",
          "node_type_id": "i3.xlarge",
          "aws_attributes": {
            "availability": "SPOT"
          },
          "num_workers": 3
        },
        "timeout_seconds": 7200,
        "max_retries": 1,
        "notebook_task": {
          "notebook_path": "/Users/example@example.com/dev/notebook"
        }
      }
    }
  ]
}

Each job, workspace notebook, workspace directory, DBFS file, or DBFS directory is defined as a ResourceConfig. Each ResourceConfig that represent a workspace or DBFS asset contains a mapping from the file or directory where it exists locally (source_path) to where it would exist in the workspace or DBFS (path).

Stack configuration template schema outlines the schema for the stack configuration template.

Deploy a stack

You deploy a stack using the databricks stack deploy <configuration-file> command.

databricks stack deploy ./config.json

During stack deployment, the DBFS and workspace assets are uploaded to your Databricks workspace and jobs are created.

At stack deploy time, a StackStatus JSON file for the deployment is saved in the same directory as the stack configuration template with the name, adding deployed immediately before the .json extension: (for example, ./config.deployed.json). This file is used by the Stack CLI to keep track of past deployed resources on your workspace.

Stack status schema outlines the schema of a stack configuration.

Important

Do not attempt to edit or move the stack status file. If you get any errors regarding the stack status file, delete the file and try the deployment again.

cat ./config.deployed.json
{
  "cli_version": "0.8.3",
  "deployed_output": [
    {
      "id": "example-workspace-notebook",
      "databricks_id": {
        "path": "/Users/example@example.com/dev/notebook"
      },
      "service": "workspace"
    },
    {
      "id": "example-workspace-config-dir",
      "databricks_id": {
        "path": "/Users/example@example.com/dev/config"
      },
      "service": "workspace"
    },
    {

      "id": "example-dbfs-library",
      "databricks_id": {
        "path": "dbfs:/tmp/lib/library.jar"
      },
      "service": "dbfs"
    },
    {
      "id": "example-job",
      "databricks_id": {
        "job_id": 123456
      },
      "service": "jobs"
    }
  ],
  "name": "example-stack"
}

StackConfig

These are the outer fields of a stack configuration template. All fields are required.

Field Name Type Description
name STRING The name of the stack.
resources List of ResourceConfig An asset in Databricks. Resources are related to three services (REST API namespaces): workspace, jobs, and dbfs.

ResourceConfig

The fields for each ResourceConfig. All fields are required.

Field Name Type Description
id STRING A unique ID for the resource. Uniqueness of ResourceConfig is enforced.
service ResourceService The REST API service that the resource operates on. One of: jobs, workspace, or dbfs.
properties ResourceProperties Fields in this are different depending the the ResourceConfig service.

ResourceProperties

The properties of a resource by ResourceService. The fields are classified as those used or not used in a Databricks REST API. All the fields listed are required.

service Fields from the REST API used in the Stack CLI Fields used only in the Stack CLI
workspace

path: STRING- Remote workspace paths of notebooks or directories. (Ex. /Users/example@example.com/notebook)

object_type: ObjectType- Notebook object type. Can only be NOTEBOOK or DIRECTORY.

source_path: STRING- Local source path of Workspace notebooks or directories. A relative path to the stack configuration template file or an absolute path in your filesystem.
jobs

Any field in JobSettings. The only field not required in JobSettings but required for the stack CLI is:

name: STRING- Name of the job to be deployed. For purposes of not creating too many duplicate jobs, the Stack CLI enforces unique names in stack deployed jobs.

None.
dbfs

path: STRING- Matching remote DBFS path. Must start with dbfs:/. (ex. dbfs:/this/is/a/sample/path)

is_dir: BOOL- Whether a DBFS path is a directory or a file.

source_path: STRING- Local source path of DBFS files or directories. A relative path to the stack config template file or an absolute path in your filesystem.

ResourceService

Each resource belongs to a specific service that aligns with the Databricks REST API. These are the services that are supported by the Stack CLI.

Service Description
workspace A workspace notebook or directory.
jobs A Databricks job.
dbfs A DBFS file or directory.

Stack status schema

StackStatus

A stack status file is created after a stack is deployed using the CLI. The top-level fields are:

Field Name Type Description
name STRING The name of the stack. This field is the same field as in StackConfig.
cli_version STRING The version of the Databricks CLI used to deploy the stack.
deployed_resources List of ResourceStatus The status of each deployed resource. For each resource defined in StackConfig, a corresponding ResourceStatus is generated here.

ResourceStatus

Field Name Type Description
id STRING A stack-unique ID for the resource.
service ResourceService The REST API service that the resource operates on. One of: jobs, workspace, or dbfs.
databricks_id DatabricksId The physical ID of the deployed resource. The actual schema depends on the type (service) of the resource.

DatabricksId

A JSON object whose field depends on the service.

Service Field in JSON Type Description
workspace path STRING The absolute path of the notebook or directory in a Databricks workspace. Naming is consistent with the Workspace API.
jobs job_id STRING The job ID as shown in a Databricks workspace. This can be used to update jobs already deployed.
dbfs path STRING The absolute path of the notebook or directory in a Databricks workspace. Naming is consistent with the DBFS API.