REST API 2.0

The Databricks REST API 2.0 supports services to manage your Databricks account, clusters, cluster policies, DBFS, global init scripts, groups, pools, instance profiles, IP access lists, jobs, libraries, MLFlow experiments and models, permissions, SCIM settings, secrets, tokens, and workspaces.

This article provides an overview of how to use the REST API. Links to each API reference, authentication options, and examples are listed at the end of the article.

For information about authenticating to the REST API, see Authentication using Databricks personal access tokens. For API examples, see API examples.

Rate limits

To ensure high quality of service under heavy load, Databricks enforces rate limits for all REST API calls. Limits are set per endpoint and per workspace to ensure fair usage and high availability. To request a limit increase, contact your Databricks representative.

Requests that exceed the rate limit return a 429 response status code.

Parse output

It can be useful to parse out parts of the JSON output. Databricks recommends the utility jq for parsing JSON. You can install jq on Linux through jq Releases, on macOS using Homebrew with brew install jq, or on Windows using Chocolatey with choco install jq. For more information on jq, see the jq Manual.

This example lists the names and IDs of available clusters in the specified workspace. This example uses a .netrc file.

curl --netrc -X GET https://abc-d1e2345f-a6b2.cloud.databricks.com/api/2.0/clusters/list \
| jq '[ .clusters[] | { id: .cluster_id, name: .cluster_name } ]'
[
  {
    "id": "1234-567890-batch123",
    "name": "My Cluster 1"
  },
  {
    "id": "2345-678901-rigs234",
    "name": "My Cluster 2"
  }
]

Compatibility

Responses for the same API version will not remove any field from the JSON output. However, the API might add new fields to the JSON output without incrementing the API version. Your programmatic workflows must be aware of these additions and ignore unknown fields.

Some STRING fields (which contain error and descriptive messaging intended to be consumed by the UI) are unstructured, and you should not depend on the format of these fields in programmatic workflows.

Use curl to invoke the Databricks REST API

curl is a popular tool for transferring data to and from servers. This section provides specific information about using curl to invoke the Databricks REST API.

Invoke a GET using a query string

While most API calls require that you specify a JSON body, for GET calls you can specify a query string by appending it after ? and surrounding the URL in quotes. If you use curl, you can specify --get (or -G) and --data (or -d) along with the query string; you do not need to surround the URL or the query string in quotes.

In the following examples, replace abc-d1e2345f-a6b2.cloud.databricks.com with the workspace URL of your Databricks deployment.

This example prints information about the specified cluster. This example uses a .netrc file.

Using ?:

curl --netrc 'https://abc-d1e2345f-a6b2.cloud.databricks.com/api/2.0/clusters/get?cluster_id=1234-567890-patch123'

Using --get and --data:

curl --netrc --get \
https://abc-d1e2345f-a6b2.cloud.databricks.com/api/2.0/clusters/get \
--data cluster_id=1234-567890-patch123
{
  "cluster_id": "1234-567890-patch123",
  "spark_context_id": 123456789012345678,
  "cluster_name": "job-239-run-1",
  "spark_version": "8.1.x-scala2.12",
  ...
}

This example lists the contents of the DBFS root. This example uses a .netrc file.

curl --netrc --get \
https://abc-d1e2345f-a6b2.cloud.databricks.com/api/2.0/dbfs/list \
--data path=/
"files": [
  {
    "path": "/tmp",
    "is_dir": true,
    "file_size": 0,
    "modification_time": 1547078156000
  },
  {
    "path": "/my_file.txt",
    "is_dir": false,
    "file_size": 40,
    "modification_time": 1541374426000
  },
  ...
]

Use Python to invoke the Databricks REST API

requests is a popular library for making HTTP requests in Python. This example uses the requests library to list information about the specified Databricks cluster. This example uses a .netrc file.

import requests
import json

instance_id = 'dbc-d1e2345f-a6b2.cloud.databricks.com'

api_version = '/api/2.0'
api_command = '/clusters/get'
url = f"https://{instance_id}{api_version}{api_command}"

params = {
  'cluster_id': '1234-567890-batch123'
}

response = requests.get(
  url = url,
  params = params
)

print(json.dumps(json.loads(response.text), indent = 2))
{
  "cluster_id": "1234-567890-batch123",
  "spark_context_id": 1234567890123456789,
  ...
}

Use PowerShell to invoke the Databricks REST API

This example uses the Invoke-RestMethod cmdlet in PowerShell to list information about the specified Databricks cluster.

$Token = 'dapia1b2345678901c23456defa7bcde8fa9'
$ConvertedToken = $Token | ConvertTo-SecureString -AsPlainText -Force

$InstanceID = 'dbc-d1e2345f-a6b2.cloud.databricks.com'
$APIVersion = '/api/2.0'
$APICommand = '/clusters/get'
$Uri = "https://$InstanceID$APIVersion$APICommand"

$Body = @{
  'cluster_id' = '1234-567890-batch123'
}

$Response = Invoke-RestMethod `
  -Authentication Bearer `
  -Token $ConvertedToken `
  -Method Get `
  -Uri  $Uri `
  -Body $Body

Write-Output $Response
cluster_id       : 1234-567890-batch123
spark_context_id : 1234567890123456789
...

Runtime version strings

Many API calls require you to specify a Databricks runtime version string. This section describes the structure of a version string in the Databricks REST API.

<M>.<F>.x[-cpu][-esr][-gpu][-ml][-photon][-hls]-scala<scala-version>

where

  • M: Databricks Runtime major release
  • F: Databricks Runtime feature release
  • cpu: CPU version (with -ml only)
  • esr: Extended Support
  • gpu: GPU-enabled
  • ml: Machine learning
  • photon: Photon
  • hls: Genomics (deprecated)
  • scala-version: version of Scala used to compile Spark: 2.10, 2.11, or 2.12

For example:

  • 7.6.x-gpu-ml-scala2.12 represents Databricks Runtime 7.6 for Machine Learning, is GPU-enabled, and uses Scala version 2.12 to compile Spark version 3.0.1
  • 6.4.x-esr-scala2.11 represents Databricks Runtime 6.4 Extended Support and uses Scala version 2.11 to compile Spark version 2.4.5

The Supported Databricks runtime releases and support schedule and Unsupported releases tables map Databricks Runtime versions to the Spark version contained in the runtime.

You can get a list of available Databricks runtime version strings by calling the Runtime versions API.

Databricks Light

apache-spark.<M>.<F>.x-scala<scala-version>

where

  • M: Apache Spark major release
  • F: Apache Spark feature release
  • scala-version: version of Scala used to compile Spark: 2.10 or 2.11

For example, apache-spark-2.4.x-scala2.11.