Databricks CLI

The Databricks command-line interface (CLI) provides an easy-to-use interface to the Databricks platform. The open source project is hosted on GitHub. The CLI is built on top of the Databricks REST API and is organized into command groups based on the Workspace API, DBFS API, Jobs API, Clusters API, Libraries API, and Secrets API: workspace, fs, jobs, runs, clusters, libraries, and secrets.

Set up the CLI

This section lists CLI requirements and limitations, and describes how to install and configure your environment to run the CLI.

Requirements and limitations

  • Python 2 - 2.7.9 and above
  • Python 3 - 3.6 and above

Important

On MacOS, the default Python 2 installation does not implement the TLSv1_2 protocol and running the CLI with this Python installation results in the error:

AttributeError: 'module' object has no attribute 'PROTOCOL_TLSv1_2'``

You can use Homebrew to install a version of Python that has ssl.PROTOCOL_TLSv1_2:

  1. Run brew install python2 to install Python 2 or brew install python to install Python 3.
  2. Update your path to prefer the newly installed Python.

Install the CLI

Run pip install databricks-cli using the appropriate version of pip for your Python installation. If you are using Python 3, run pip3.

Set up authentication

Before you can run CLI commands, you must set up authentication. To authenticate to the CLI you use a Databricks personal access token. To configure the CLI to use the access token, run databricks configure --token. After you follow the prompts, your access credentials are stored in the file ~/.databrickscfg.

You can specify the --insecure flag, which creates a configuration that can connect to a workspace without a valid SSL certificate.

You can also use your username and password pair to authenticate. Run databricks configure and follow the prompts.

Important

Because the CLI is built on top of the REST API, your authentication configuration in your .netrc file takes precedence over your configuration in .databrickscfg.

Connection profiles

The Databricks CLI configuration supports multiple connection profiles. The same installation of Databricks CLI can be used to make API calls on multiple Databricks workspaces.

To add a connection profile:

databricks configure [--profile <profile>]

To use the connection profile:

databricks workspace ls --profile <profile>

Alias command groups

Sometimes it can be inconvenient to prefix each CLI invocation with the name of a command group, for example databricks workspace ls. To make the CLI easier to use, you can alias command groups to shorter commands. For example to shorten databricks workspace ls to dw ls in the Bourne again shell, you can add alias dw="databricks workspace" to the appropriate bash profile. Typically, this file is located at ~/.bash_profile.

Tip

Databricks has already aliased databricks fs to dbfs; databricks fs ls and dbfs ls are equivalent.

Use the CLI

This section shows you how to get CLI help, parse CLI output, and invoke commands in each command group.

Display CLI command group help

You list the subcommands for any command group by running databricks <group> -h. For example, you list the DBFS CLI subcommands by running databricks fs -h.

Use jq to parse CLI output

Some Databricks CLI commands output the JSON response from the API endpoint. Sometimes it can be useful to parse out parts of the JSON to pipe into other commands. For example, to copy a job definition, you must take the settings field of /api/2.0/jobs/get and use that as an argument to the databricks jobs create command.

In these cases, we recommend you to use the utility jq. You can install jq on MacOS using Homebrew with brew install jq.

For more information on jq, see the jq Manual.

JSON string parameters

All JSON string parameters must be enclosed in single quotes. For example:

databricks jobs run-now --job-id 9 --jar-params '["20180505", "alantest"]'

Workspace CLI

You run subcommands by appending them to databricks workspace.

databricks workspace -h
Usage: databricks workspace [OPTIONS] COMMAND [ARGS]...

  Utility to interact with the Databricks workspace. Workspace paths must be
  absolute and be prefixed with `/`.

Common Options:
  -v, --version  [VERSION]
  -h, --help     Show this message and exit.

Commands:
  delete      Deletes objects from the Databricks workspace. rm and delete are synonyms.
    Options:
        -r, --recursive
  export      Exports a file from the Databricks workspace.
    Options:
      -f, --format FORMAT      SOURCE, HTML, JUPTYER, or DBC. Set to SOURCE by default.
      -o, --overwrite          Overwrites file with the same name as a Workspace file.
  export_dir  Recursively exports a directory from the Databricks workspace.
    Options:
      -o, --overwrite          Overwrites local files with the same names as Workspace files.
  import      Imports a file from local to the Databricks workspace.
    Options:
      -l, --language LANGUAGE  SCALA, PYTHON, SQL, R  [required]
      -f, --format FORMAT      SOURCE, HTML, JUPTYER, or DBC. Set to SOURCE by default.
      -o, --overwrite          Overwrites Workspace files with the same names as local files.
  import_dir  Recursively imports a directory to the Databricks workspace.
    Options:
      -o, --overwrite          Overwrites Workspace files with the same names as local files.
      -e, --exclude-hidden-files
  list        Lists objects in the Databricks workspace. ls and list are synonyms.
    Options:
      --absolute               Displays absolute paths.
      -l                       Displays full information including ObjectType, Path, Language
  ls          Lists objects in the Databricks workspace. ls and list are synonyms.
    Options:
      --absolute               Displays absolute paths.
      -l                       Displays full information including ObjectType, Path, Language
  mkdirs      Makes directories in the Databricks workspace.
  rm          Deletes objects from the Databricks workspace. rm and delete are synonyms.
    Options:
        -r, --recursive

List Workspace files

databricks workspace ls /Users/example@databricks.com
Usage Logs ETL
Common Utilities
guava-21.0

Import a local directory of notebooks

The databricks workspace import_dir command recursively imports a directory from the local filesystem to the Workspace. Only directories and files with the extensions of .scala, .py, .sql, .r, .R are imported. When imported, these extensions are stripped from the notebook name.

To overwrite existing notebooks at the target path, add the flag -o.

tree
.
├── a.py
├── b.scala
├── c.sql
├── d.R
└── e
databricks workspace import_dir . /Users/example@databricks.com/example
./a.py -> /Users/example@databricks.com/example/a
./b.scala -> /Users/example@databricks.com/example/b
./c.sql -> /Users/example@databricks.com/example/c
./d.R -> /Users/example@databricks.com/example/d
databricks workspace ls /Users/example@databricks.com/example -l
NOTEBOOK   a  PYTHON
NOTEBOOK   b  SCALA
NOTEBOOK   c  SQL
NOTEBOOK   d  R
DIRECTORY  e

Export a Workspace folder to the local filesystem

You can export a folder of notebooks from the Workspace to the local filesystem. To do this, run:

databricks workspace export_dir /Users/example@databricks.com/example .

DBFS CLI

You run commands appending them to databricks fs (or the alias dbfs), prefixing all DBFS paths with dbfs:/.

databricks fs -h
Usage: databricks fs [OPTIONS] COMMAND [ARGS]...

  Utility to interact with DBFS. DBFS paths are all prefixed
  with dbfs:/. Local paths can be absolute or local.

Options:
  -v, --version
  -h, --help     Show this message and exit.

Commands:
  configure
  cp         Copies files to and from DBFS.
    Options:
      -r, --recursive
      --overwrite     Overwrites files that exist already.
  ls         Lists files in DBFS.
    Options:
      --absolute      Displays absolute paths.
      -l              Displays full information including size and file type.
  mkdirs     Makes directories in DBFS.
  mv         Moves a file between two DBFS paths.
  rm         Removes files from dbfs.
    Options:
      -r, --recursive

Copy a file to DBFS

dbfs cp test.txt dbfs:/test.txt
# Or recursively
dbfs cp -r test-dir dbfs:/test-dir

Copy a file from DBFS

dbfs cp dbfs:/test.txt ./test.txt
# Or recursively
dbfs cp -r dbfs:/test-dir ./test-dir

Cluster CLI

You run subcommands by appending them to databricks clusters.

databricks clusters -h
Usage: databricks clusters [OPTIONS] COMMAND [ARGS]...

  Utility to interact with Databricks clusters.

Options:
  -v, --version  [VERSION]
  -h, --help     Show this message and exit.

Commands:
  create           Creates a Databricks cluster.
    Options:
      --json-file PATH         File containing JSON request to POST to /api/2.0/clusters/create.
      --json JSON              JSON string to POST to /api/2.0/clusters/create.
  delete           Removes a Databricks cluster.
    Options:
      --cluster-id CLUSTER_ID  Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
  get              Retrieves metadata about a cluster.
    Options:
      --cluster-id CLUSTER_ID  Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
  list             Lists active and recently terminated clusters.
    Options:
      --output FORMAT          JSON or TABLE. Set to TABLE by default.
  list-node-types  Lists node types for a cluster.
  list-zones       Lists zones where clusters can be created.
  restart          Restarts a Databricks cluster.
    Options:
      --cluster-id CLUSTER_ID  Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
  spark-versions   Lists possible Databricks Runtime versions.
  start            Starts a terminated Databricks cluster.
    Options:
      --cluster-id CLUSTER_ID  Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.

List runtime versions

databricks clusters spark-versions

List node types

databricks clusters list-node-types

Job CLI

You run job subcommands by appending them to databricks jobs and job run commands by appending them to databricks runs.

databricks jobs -h
Usage: databricks jobs [OPTIONS] COMMAND [ARGS]...

  Utility to interact with jobs.

  Job runs are handled by ``databricks runs``.

Options:
  -v, --version  [VERSION]
  -h, --help     Show this message and exit.

Commands:
  create   Creates a job.
    Options:
      --json-file PATH            File containing JSON request to POST to /api/2.0/jobs/create.
      --json JSON                 JSON string to POST to /api/2.0/jobs/create.
  delete   Deletes a job.
    Options:
      --job-id JOB_ID             Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#job/$JOB_ID.
  get      Describes the metadata for a job.
  Options:
    --job-id JOB_ID               Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#job/$JOB_ID.
  list     Lists the jobs in the Databricks Job Service.
  reset    Resets (edits) the definition of a job.
    Options:
      --job-id JOB_ID             Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#job/$JOB_ID.
      --json-file PATH            File containing JSON request to POST to /api/2.0/jobs/create.
      --json JSON                 JSON string to POST to /api/2.0/jobs/create.
  run-now  Runs a job with optional per-run parameters.
    Options:
      --job-id JOB_ID             Can be found in the URL at https://<databricks-instance>/#job/$JOB_ID.
      --jar-params JSON           JSON string specifying an array of parameters. i.e. '["param1", "param2"]'
      --notebook-params JSON      JSON string specifying a map of key-value pairs. i.e. '{"name": "john doe", "age": 35}'
      --python-params JSON        JSON string specifying an array of parameters. i.e. '["param1", "param2"]'
      --spark-submit-params JSON  JSON string specifying an array of parameters. i.e. '["--class", "org.apache.spark.examples.SparkPi"]'
databricks runs -h
Usage: databricks runs [OPTIONS] COMMAND [ARGS]...

  Utility to interact with job runs.

Options:
  -v, --version  [VERSION]
  -h, --help     Show this message and exit.

Commands:
  cancel  Cancels a run.
    Options:
        --run-id RUN_ID  [required]
  get     Gets the metadata about a run in JSON form.
    Options:
      --run-id RUN_ID  [required]
  list    Lists job runs.
  submit  Submits a one-time run.
    Options:
      --json-file PATH  File containing JSON request to POST to /api/2.0/jobs/runs/submit.
      --json JSON       JSON string to POST to /api/2.0/jobs/runs/submit.

List and find jobs

The databricks jobs list command has two output formats, JSON and TABLE. The TABLE format is outputted by default and returns a two column table (job ID, job name).

To find a job by name, run:

databricks jobs list | grep "JOB_NAME"

Copy a job

Note

This example requires the program jq.

SETTINGS_JSON=$(databricks jobs get --job-id 284907 | jq .settings)
# JQ Explanation:
#   - peek into top level `settings` field.
databricks jobs create --json "$SETTINGS_JSON"

Delete “Untitled” jobs

databricks jobs list --output json | jq '.jobs[] | select(.settings.name == "Untitled") | .job_id' | xargs -n 1 databricks jobs delete --job-id
# Explanation:
#   - List jobs in JSON.
#   - Peek into top level `jobs` field.
#   - Select only jobs with name equal to "Untitled"
#   - Print those job IDs out.
#   - Invoke `databricks jobs delete --job-id` once per row with the $job_id appended as an argument to the end of the command.

Library CLI

You run library subcommands by appending them to databricks libraries.

Important

A library installed using the CLI does not appear in the cluster UI.

databricks libraries -h
Usage: databricks libraries [OPTIONS] COMMAND [ARGS]...

  Utility to interact with libraries.

Options:
  -v, --version  [VERSION]
  -h, --help     Show this message and exit.

Commands:
  all-cluster-statuses  Get the status of all libraries.
  cluster-status        Get the status of all libraries for a cluster.
    Options:
      --cluster-id CLUSTER_ID   Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
  install               Install a library on a cluster.
    Options:
      --cluster-id CLUSTER_ID   Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
      --all                     Uninstalls all libraries.
      --jar TEXT                JAR on DBFS or S3 or WASB.
      --egg TEXT                Egg on DBFS or S3 or WASB.
      --maven-coordinates TEXT  Maven coordinates in the form of GroupId:ArtifactId:Version (i.e.org.jsoup:jsoup:1.7.2).
      --maven-repo TEXT         Maven repository to install the Maven package from. If omitted, both Maven Repository and Spark Packages are searched.
      --maven-exclusion TEXT    List of dependences to exclude. For example: --maven-exclusion "slf4j:slf4j" --maven-exclusion "*:hadoop-client".
      --pypi-package TEXT       The name of the PyPI package to install. An optional exact version specification is also supported. Examples "simplejson" and "simplejson==3.8.0".
      --pypi-repo TEXT          The repository where the package can be found. If not specified, the default pip index is used.
      --cran-package TEXT       The name of the CRAN package to install.
      --cran-repo TEXT          The repository where the package can be found. If not specified, the default CRAN repo is used.
  list                  Shortcut to `all-cluster-statuses` or `cluster-status`.
    Options:
      --cluster-id CLUSTER_ID   Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
  uninstall             Uninstall a library on a cluster.
    Options:
      --cluster-id CLUSTER_ID   Can be found in the URL at https://<databricks-instance>/?o=<16-digit-number>#/setting/clusters/$CLUSTER_ID/configuration.
      --jar TEXT                JAR on DBFS or S3 or WASB.
      --egg TEXT                Egg on DBFS or S3 or WASB.
      --maven-coordinates TEXT  Maven coordinates in the form of GroupId:ArtifactId:Version (i.e.org.jsoup:jsoup:1.7.2).
      --maven-repo TEXT         Maven repository to install the Maven package from. If omitted, both Maven Repository and Spark Packages are searched.
      --maven-exclusion TEXT    List of dependences to exclude. For example: --maven-exclusion "slf4j:slf4j" --maven-exclusion "*:hadoop-client".
      --pypi-package TEXT       The name of the PyPI package to install. An optional exact version specification is also supported. Examples "simplejson" and "simplejson==3.8.0".
      --pypi-repo TEXT          The repository where the package can be found. If not specified, the default pip index is used.
      --cran-package TEXT       The name of the CRAN package to install.
      --cran-repo TEXT          The repository where the package can be found. If not specified, the default CRAN repo is used.

Install a JAR from DBFS

databricks libraries install --cluster-id $CLUSTER_ID --jar dbfs:/test-dir/test.jar

List library statuses for a cluster

databricks libraries list --cluster-id $CLUSTER_ID

Secrets CLI

Note

The Secrets CLI requires Databricks CLI 0.7.1 or above.

You run secrets subcommands by appending them to databricks secrets.

databricks secrets --help
Usage: databricks secrets [OPTIONS] COMMAND [ARGS]...

  Utility to interact with secret API.

Options:
  -v, --version   [VERSION]
  --profile TEXT  CLI connection profile to use. The default profile is
                  "DEFAULT".
  -h, --help      Show this message and exit.

Commands:
  create-scope  Creates a secret scope.
    Options:
      --scope SCOPE                  The name of the secret scope.
      --initial-manage-principal     The initial principal that can manage the created secret scope.
                                      If specified, the initial ACL with MANAGE permission applied
                                      to the scope is assigned to the supplied principal (user or group).
                                      The only supported principal is the group
                                      "users", which contains all users in the workspace. If not
                                      specified, the initial ACL with MANAGE permission applied to
                                      the scope is assigned to request issuer's user identity.
  delete        Deletes a secret.
    Options:
      --scope SCOPE                  The name of the secret scope.
      --key KEY                      The name of secret key.
  delete-acl    Deletes an access control rule for a principal.
    Options:
      --scope SCOPE                  The name of the scope.
      --principal PRINCIPAL          The name of the principal.
  delete-scope  Deletes a secret scope.
    Options:
      --scope SCOPE                  The name of the secret scope.
  get-acl       Gets the details for an access control rule.
    Options:
      --scope SCOPE                  The name of the secret scope.
      --principal PRINCIPAL          The name of the principal.
      --output FORMAT                JSON or TABLE. Set to TABLE by default.
  list          Lists all the secrets in a scope.
    Options:
      --scope SCOPE                  The name of the secret scope.
      --output FORMAT                JSON or TABLE. Set to TABLE by default.
  list-acls     Lists all access control rules for a given secret scope.
    Options:
      --scope SCOPE                  The name of the secret scope.
      --output FORMAT                JSON or TABLE. Set to TABLE by default.
  list-scopes   Lists all secret scopes.
      --output FORMAT                JSON or TABLE. Set to TABLE by default.
  put           Puts a secret in a scope.
    Options:
      --scope SCOPE                  The name of the secret scope.
      --key KEY                      The name of the secret key.  [required]
      --string-value TEXT            Read value from string and stored in UTF-8 (MB4) form
      --binary-file PATH             Read value from binary-file and stored as bytes.
  put-acl       Creates or overwrites an access control rule for a principal
                applied to a given secret scope.
    Options:
      --scope SCOPE                    The name of the secret scope.
      --principal PRINCIPAL            The name of the principal.
      --permission [MANAGE|WRITE|READ] The permission to apply.

Create a secret scope

databricks secrets create-scope --scope my-scope

List all secret scopes in workspace

databricks secrets list-scopes

Delete a secret scope

databricks secrets delete-scope --scope my-scope

Create or update a secret in a secret scope

There are three ways to store a secret. The easiest way is to use the --string-value option; the secret will be stored in UTF-8 (MB4) form. You should be careful with this option, because your secret may be stored in your command line history in plain text.

databricks secrets put --scope my-scope --key my-key --string-value my-value

You can also use the --binary-file option to provide a secret stored in a file. The file content will be read as is and stored as bytes.

databricks secrets put --scope my-scope --key my-key --binary-file my-secret.txt

If you don’t specify any of the two options, an editor will be opened for you to enter your secret. Follow the instructions shown on the editor to enter your secret.

databricks secrets put --scope my-scope --key my-key

List secrets stored within the secret scope

databricks secrets list --scope my-scope

Note that there is no interface to get secrets from the CLI. You must use the Databricks Utilities secret utilities interface within a Databricks notebook to access your secret.

Delete a secret in a secret scope

databricks secrets delete --scope my-scope --key my-key

Grant or change ACL for a principal

databricks secrets put-acl --scope my-scope --principal principal --permission MANAGE

List ACLs in a secret scope

databricks secrets list-acls --scope my-scope

Get ACL for a principal in a secret scope

databricks secrets get-acl --scope my-scope --principal principal

Revoke ACL for a principal

databricks secrets delete-acl --scope my-scope --principal principal