The Databricks command-line interface (CLI) provides an easy-to-use interface to the Databricks platform. The open source project is hosted on GitHub. The CLI is built on top of the Databricks REST API 2.0 and is organized into command groups based on the Workspace API, Clusters API, Instance Pools API, DBFS API, Groups API, Jobs API, Libraries API, and Secrets API:
This CLI is under active development and is released as an Experimental client. This means that interfaces are still subject to change.
This section lists CLI requirements and describes how to install and configure your environment to run the CLI.
Python 3 - 3.6 and above
Python 2 - 2.7.9 and above
On MacOS, the default Python 2 installation does not implement the TLSv1_2 protocol and running the CLI with this Python installation results in the error:
AttributeError: 'module' object has no attribute 'PROTOCOL_TLSv1_2'. Use Homebrew to install a version of Python that has
pip install databricks-cli using the appropriate version of
pip for your Python installation.
Before you can run CLI commands, you must set up authentication. To authenticate to the CLI you use a personal access token. To configure the CLI to use the access token, run
databricks configure --token. The command issues the prompts:
Databricks Host (should begin with https://): Token:
After you complete the prompts, your access credentials are stored in the file
~/.databrickscfg. The file should contain entries like:
host = https://<databricks-instance> token = <personal-access-token>
For CLI 0.8.1 and above, you can change the path of this file by setting the environment variable
You can also use your username and password to authenticate. Run
databricks configure and follow the prompts.
Because the CLI is built on top of the REST API, your authentication configuration in your .netrc file takes precedence over your configuration in
CLI 0.8.0 and above supports the following environment variables:
An environment variable setting takes precedence over the setting in the configuration file.
The Databricks CLI configuration supports multiple connection profiles. The same installation of Databricks CLI can be used to make API calls on multiple Databricks workspaces.
To add a connection profile:
databricks configure [--profile <profile>]
To use the connection profile:
databricks workspace ls --profile <profile>
Sometimes it can be inconvenient to prefix each CLI invocation with the name of a command group, for example
databricks workspace ls. To make the CLI easier to use, you can alias command groups to shorter commands.
For example to shorten
databricks workspace ls to
dw ls in the
Bourne again shell, you can add
alias dw="databricks workspace" to the appropriate bash profile. Typically,
this file is located at
Databricks has already aliased
databricks fs to
databricks fs ls and
dbfs ls are equivalent.
This section shows you how to get CLI help, parse CLI output, and invoke commands in each command group.
You list the subcommands for any command group by running
databricks <group> -h. For example, you list the DBFS CLI subcommands by running
databricks fs -h.
Some Databricks CLI commands output the JSON response from the API endpoint. Sometimes it can be
useful to parse out parts of the JSON to pipe into other commands. For example, to copy a job
definition, you must take the
settings field of
/api/2.0/jobs/get and use that as an argument
databricks jobs create command.
In these cases, we recommend you to use the utility
jq. You can install
jq on MacOS using
brew install jq.
For more information on
jq, see the jq Manual.
String parameters are handled differently depending on your operating system:
Unix: You must enclose JSON string parameters in single quotes. For example:
databricks jobs run-now --job-id 9 --jar-params '["20180505", "alantest"]'
Windows: You must enclose JSON string parameters in double quotes, and the quote characters inside the string must be preceded by
\. For example:
databricks jobs run-now --job-id 9 --jar-params "[\"20180505\", \"alantest\"]"