The Databricks Command Line Interface (CLI) is an open source tool that provides an easy-to-use interface to the Databricks platform. The CLI is built on top of the Databricks Rest APIs. Currently, the CLI fully implements the DBFS API, Workspace API, Jobs API, and Cluster API.
The project is currently hosted on Github.
- Python Version > 2.7.9
- Python 3 is not supported
The default Python installation for MacOS does not implement the TLSv1_2 protocol.
Running the CLI on these Python versions will result in the error
AttributeError: 'module' object has no attribute 'PROTOCOL_TLSv1_2'.
To use databricks-cli you should install a version of Python which has
For MacOS, the easiest way may be to install Python with Homebrew.
To install, simply run
pip install databricks-cli
In order to upgrade your databricks-cli installation please run
pip install --upgrade databricks-cli
Setting Up Authentication¶
The recommended way authenticate is to use a Databricks Access Token.
To configure the CLI to use the access token, run
databricks configure --token. After you follow the prompts,
your access credentials will be stored in the file
You can also use your username and password pair to authenticate.
databricks configure and follow the prompts.
Workspace CLI Examples¶
The implemented commands for the Workspace CLI can be listed by running
databricks workspace -h.
Commands are run by appending them to
databricks workspace. To make it easier to use the workspace
CLI, feel free to alias
databricks workspace to something shorter. For more information
reference Aliasing Command Groups and Workspace API.
$ databricks workspace -h Usage: databricks workspace [OPTIONS] COMMAND [ARGS]... Utility to interact with the Databricks workspace. Workspace paths must be absolute and be prefixed with `/`. Options: -v, --version [VERSION] -h, --help Show this message and exit. Commands: delete Deletes objects from the Databricks workspace. rm and delete are synonyms. export Exports a file from the Databricks workspace. export_dir Recursively exports a directory from the Databricks workspace. import Imports a file from local to the Databricks workspace. import_dir Recursively imports a directory to the Databricks workspace. list List objects in the Databricks Workspace. ls and list are synonyms. ls List objects in the Databricks Workspace. ls and list are synonyms. mkdirs Make directories in the Databricks Workspace. rm Deletes objects from the Databricks workspace. rm and delete are synonyms.
Listing Workspace Files¶
$ databricks workspace ls /Usersemail@example.com Usage Logs ETL Common Utilities guava-21.0
Importing a local directory of notebooks¶
databricks workspace import_dir command recursively imports a directory
from the local filesystem to the Databricks workspace. Only directories and
files with the extensions of
.R are imported.
When imported, these extensions are stripped of the notebook name.
To overwrite existing notebooks at the target path, you must add the flag
$ tree . ├── a.py ├── b.scala ├── c.sql ├── d.R └── e
$ databricks workspace import_dir . /Usersfirstname.lastname@example.org/example ./a.py -> /Usersemail@example.com/example/a ./b.scala -> /Usersfirstname.lastname@example.org/example/b ./c.sql -> /Usersemail@example.com/example/c ./d.R -> /Usersfirstname.lastname@example.org/example/d
$ databricks workspace ls /Usersemail@example.com/example -l NOTEBOOK a PYTHON NOTEBOOK b SCALA NOTEBOOK c SQL NOTEBOOK d R DIRECTORY e
Exporting a workspace directory to the local filesystem¶
Similarly, it is possible to export a directory of notebooks from the Databricks workspace to the local filesystem. To do this, the command is simply
$ databricks workspace export_dir /Usersfirstname.lastname@example.org/example .
DBFS CLI Examples¶
The implemented commands for the DBFS CLI can be listed by running
databricks fs -h.
Commands are run by appending them to
databricks fs and all dbfs paths should be prefixed with
dbfs:/. To make the command less verbose, we’ve
gone ahead and aliased
databricks fs. For more information reference DBFS API.
$ databricks fs -h Usage: databricks fs [OPTIONS] COMMAND [ARGS]... Utility to interact with DBFS. DBFS paths are all prefixed with dbfs:/. Local paths can be absolute or local. Options: -v, --version -h, --help Show this message and exit. Commands: configure cp Copy files to and from DBFS. ls List files in DBFS. mkdirs Make directories in DBFS. mv Moves a file between two DBFS paths. rm Remove files from dbfs.
Copying a file to DBFS¶
dbfs cp test.txt dbfs:/test.txt # Or recursively dbfs cp -r test-dir dbfs:/test-dir
Copying a file from DBFS¶
dbfs cp dbfs:/test.txt ./test.txt # Or recursively dbfs cp -r dbfs:/test-dir ./test-dir
Jobs CLI Examples¶
The implemented commands for the jobs CLI can be listed by running
databricks jobs -h.
Job run commands are handled by
databricks runs -h.
$ databricks jobs -h Usage: databricks jobs [OPTIONS] COMMAND [ARGS]... Utility to interact with jobs. This is a wrapper around the jobs API (https://docs.databricks.com/api/latest/jobs.html). Job runs are handled by ``databricks runs``. Options: -v, --version [VERSION] -h, --help Show this message and exit. Commands: create Creates a job. delete Deletes the specified job. get Describes the metadata for a job. list Lists the jobs in the Databricks Job Service. reset Resets (edits) the definition of a job. run-now Runs a job with optional per-run parameters.
$ databricks runs -h Usage: databricks runs [OPTIONS] COMMAND [ARGS]... Utility to interact with job runs. Options: -v, --version [VERSION] -h, --help Show this message and exit. Commands: cancel Cancels the run specified. get Gets the metadata about a run in json form. list Lists job runs. submit Submits a one-time run.
Listing and finding jobs¶
databricks jobs list command has two output formats,
TABLE format is outputted by default and returns a two column table (job ID, job name).
To find a job by name
databricks jobs list | grep "JOB_NAME"
Copying a job¶
This example requires the program jq.
SETTINGS_JSON=$(databricks jobs get --job-id 284907 | jq .settings) # JQ Explanation: # - peek into top level `settings` field. databricks jobs create --json "$SETTINGS_JSON"
Deleting “Untitled” Jobs¶
databricks jobs list --output json | jq '.jobs | select(.settings.name == "Untitled") | .job_id' | xargs -n 1 databricks jobs delete --job-id # Explanation: # - List jobs in JSON. # - Peek into top level `jobs` field. # - Select only jobs with name equal to "Untitled" # - Print those job ID's out. # - Invoke `databricks jobs delete --job-id` once per row with the $job_id appended as an argument to the end of the command.
Clusters CLI Examples¶
The implemented commands for the clusters CLI can be listed by running
databricks clusters -h.
$ databricks clusters -h Usage: databricks clusters [OPTIONS] COMMAND [ARGS]... Utility to interact with Databricks clusters. Options: -v, --version [VERSION] -h, --help Show this message and exit. Commands: create Creates a Databricks cluster. delete Removes a Databricks cluster given its ID. get Retrieves metadata about a cluster. list Lists active and recently terminated clusters. list-node-types Lists possible node types for a cluster. list-zones Lists zones where clusters can be created. restart Restarts a Databricks cluster given its ID. spark-versions Lists possible Databricks Runtime versions... start Starts a terminated Databricks cluster given its ID.
Listing runtime versions¶
databricks clusters spark-versions
Listing node types¶
databricks clusters list-node-types
Aliasing Command Groups¶
Sometimes it can be inconvenient to prefix each CLI invocation with the name of a command group. Writing
databricks workspace ls can be quite verbose! To make the CLI easier to use, you can alias different
command groups to shorter commands. For example to shorten
databricks workspace ls to
dw ls in the
Bourne again shell, you can add
alias dw="databricks workspace" to the appropriate bash profile. Typically,
this file is located at
By default, we have already aliased
databricks fs to
Some Databricks CLI commands will output the JSON response from the API endpoint. Sometimes it can be
useful to parse out parts of the JSON to pipe into other commands. For example, to copy a job
definition, we must take the
settings field of
/api/2.0/jobs/get use that as an argument
databricks jobs create command.
In these cases, we recommend you to use the utility
jq. MacOS users can install
brew install jq.
For more information on
jq reference its documentation.