Databricks CLI (legacy)
Important
This documentation has been retired and might not be updated.
Databricks recommends that you use Databricks CLI version 0.205 or above instead of the legacy Databricks CLI version 0.18 or below. Databricks CLI version 0.18 or below is not supported by Databricks. For information about Databricks CLI versions 0.205 and above, see What is the Databricks CLI?.
To migrate from Databricks CLI version 0.18 or below to Databricks CLI version 0.205 or above, see Databricks CLI migration.
The legacy Databricks CLI is in an Experimental state. Databricks plans no new feature work for the legacy Databricks CLI at this time.
The legacy Databricks CLI is not supported through Databricks Support channels. To provide feedback, ask questions, and report issues, use the Issues tab in the Command Line Interface for Databricks repository in GitHub.
The legacy Databricks command-line interface (also known as the legacy Databricks CLI) is a utility that provides an easy-to-use interface to automate the Databricks platform from your terminal, command prompt, or automation scripts.
Requirements
Python 3 - 3.6 and above
Python 2 - 2.7.9 and above
Important
On macOS, the default Python 2 installation does not implement the TLSv1_2 protocol, and running the legacy Databricks CLI with this Python installation results in the error: AttributeError: 'module' object has no attribute 'PROTOCOL_TLSv1_2'
. Use Homebrew to install a version of Python that has ssl.PROTOCOL_TLSv1_2
.
Set up the CLI
This section describes how to set up the legacy Databricks CLI.
Install or update the CLI
This section describes how to install or update your development machine to run the legacy Databricks CLI.
Set up authentication
Before you can run legacy Databricks CLI commands, you must set up authentication between the legacy Databricks CLI and Databricks. This section describes how to set up authentication for the legacy Databricks CLI.
To authenticate with the legacy Databricks CLI, you can use a Databricks personal access token.
Note
As a security best practice when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use OAuth tokens.
If you use personal access token authentication, Databricks recommends using personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
Set up authentication using a Databricks personal access token
To configure the legacy Databricks CLI to use a personal access token, run the following command:
databricks configure --token
The command begins by issuing the prompt:
Databricks Host (should begin with https://):
Enter your workspace URL, with the format https://<instance-name>.cloud.databricks.com
. To get your workspace URL, see Workspace instance names, URLs, and IDs.
The command continues by issuing the prompt to enter your personal access token:
Token:
After you complete the prompts, your access credentials are stored in the file ~/.databrickscfg
on Linux or macOS, or %USERPROFILE%\.databrickscfg
on Windows. The file contains a default profile entry:
[DEFAULT]
host = <workspace-URL>
token = <personal-access-token>
If the .databrickscfg
file already exists, that file’s DEFAULT
configuration profile is overwritten with the new data. To create a configuration profile with a different name instead, see Connection profiles.
For CLI 0.8.1 and above, you can change the path of this file by setting the environment variable DATABRICKS_CONFIG_FILE
.
export DATABRICKS_CONFIG_FILE=<path-to-file>
setx DATABRICKS_CONFIG_FILE "<path-to-file>" /M
Important
Beginning with CLI 0.17.2, the CLI does not work with a .netrc file. You can have a .netrc
file in your environment for other purposes, but the CLI will not use that .netrc
file.
CLI 0.8.0 and above supports the following Databricks environment variables:
DATABRICKS_HOST
DATABRICKS_USERNAME
DATABRICKS_PASSWORD
DATABRICKS_TOKEN
An environment variable setting takes precedence over the setting in the configuration file.
Connection profiles
The legacy Databricks CLI configuration supports multiple connection profiles. The same installation of legacy Databricks CLI can be used to make API calls on multiple Databricks workspaces.
To add a connection profile, specify a unique name for the profile:
databricks configure --token --profile <profile-name>
The .databrickscfg
file contains a corresponding profile entry:
[<profile-name>]
host = <workspace-URL>
token = <token>
To use the connection profile:
databricks <group> <command> --profile <profile-name>
If --profile <profile-name>
is not specified, the default profile is used. If a default profile is not found, you are prompted to configure the CLI with a default profile.
Test your connection profiles
To check whether you set up any connection profiles correctly, you can run a command such as the following with one of your connection profile names:
databricks fs ls dbfs:/ --profile <profile-name>
If successful, this command lists the files and directories in the DBFS root of the workspace for the specified connection profile. Run this command for each connection profile that you want to test.
To view your available profiles, see your .databrickscfg
file.
Use the CLI
This section shows you how to get legacy Databricks CLI help, parse legacy Databricks CLI output, and invoke commands in each command group.
Display CLI command group help
You list the subcommands for any command group by using the --help
or -h
option. For example, to list the DBFS CLI subcommands:
databricks fs -h
Display CLI subcommand help
You list the help for a subcommand by using the --help
or -h
option. For example, to list the help for the DBFS copy files subcommand:
databricks fs cp -h
Alias command groups
Sometimes it can be inconvenient to prefix each legacy Databricks CLI invocation with the name of a command group, for example
databricks workspace ls
in the legacy Databricks CLI. To make the legacy Databricks CLI easier to use, you can alias command groups to shorter commands.
For example, to shorten databricks workspace ls
to dw ls
in the
Bourne again shell, you can add alias dw="databricks workspace"
to the appropriate bash profile. Typically,
this file is located at ~/.bash_profile
.
Tip
The legacy Databricks CLI already aliases databricks fs
to dbfs
; databricks fs ls
and dbfs ls
are equivalent.
Use jq
to parse CLI output
Some legacy Databricks CLI commands output the JSON response from the API endpoint. Sometimes it can be
useful to parse out parts of the JSON to pipe into other commands. For example, to copy a job
definition, you must take the settings
field of a get job command and use that as an argument
to the create job command. In these cases, we recommend you to use the utility jq
.
For example, the following command prints the settings of the job with the ID of 233.
databricks jobs list --output JSON | jq '.jobs[] | select(.job_id == 233) | .settings'
Output:
{
"name": "Quickstart",
"new_cluster": {
"spark_version": "7.5.x-scala2.12",
"spark_env_vars": {
"PYSPARK_PYTHON": "/databricks/python3/bin/python3"
},
"num_workers": 8,
...
},
"email_notifications": {},
"timeout_seconds": 0,
"notebook_task": {
"notebook_path": "/Quickstart"
},
"max_concurrent_runs": 1
}
As another example, the following command prints only the names and IDs of all available clusters in the workspace:
databricks clusters list --output JSON | jq '[ .clusters[] | { name: .cluster_name, id: .cluster_id } ]'
Output:
[
{
"name": "My Cluster 1",
"id": "1234-567890-grip123"
},
{
"name": "My Cluster 2",
"id": "2345-678901-patch234"
}
]
You can install jq
for example on macOS by using Homebrew with brew install jq
or on Windows by using Chocolatey with choco install jq
. For more information on jq
, see the jq Manual.
JSON string parameters
String parameters are handled differently depending on your operating system:
You must enclose JSON string parameters in single quotes. For example:
'["20180505", "alantest"]'
You must enclose JSON string parameters in double quotes, and the quote characters inside the string must be preceded by \
. For example:
"[\"20180505\", \"alantest\"]"
Troubleshooting
The following sections provide tips for troubleshooting common issues with the legacy Databricks CLI.
Using EOF with databricks configure
does not work
For Databricks CLI 0.12.0 and above, using the end of file (EOF
) sequence in a script to pass parameters to the databricks configure
command does not work. For example, the following script causes Databricks CLI to ignore the parameters, and no error message is thrown:
# Do not do this.
databricksUrl=<workspace-url>
databricksToken=<personal-access-token>
databricks configure --token << EOF
$databricksUrl
$databricksToken
EOF
To fix this issue, do one of the following:
Use one of the other programmatic configuration options as described in Set up authentication.
Manually add the
host
andtoken
values to the.databrickscfg
file as described in Set up authentication.Downgrade your installation of the Databricks CLI to 0.11.0 or below, and run your script again.
CLI commands
- Cluster Policies CLI (legacy)
- Clusters CLI (legacy)
- DBFS CLI (legacy)
- Delta Live Tables CLI (legacy)
- Groups CLI (legacy)
- Instance Pools CLI (legacy)
- Jobs CLI (legacy)
- Libraries CLI (legacy)
- Repos CLI (legacy)
- Runs CLI (legacy)
- Secrets CLI (legacy)
- Stack CLI (legacy)
- Tokens CLI (legacy)
- Unity Catalog CLI (legacy)
- Workspace CLI (legacy)