API examples

This article contains examples that demonstrate how to use the Databricks REST API 2.0.

In the following examples, replace <databricks-instance> with the workspace URL of your Databricks deployment.

Authentication

To learn how to authenticate to the REST API, review Authentication using Databricks personal access tokens.

The examples in this article assume you are using Databricks personal access tokens. In the following examples, replace <your-token> with your personal access token. The curl examples assume that you store Databricks API credentials under .netrc. The Python examples use Bearer authentication. Although the examples show storing the token in the code, for leveraging credentials safely in Databricks, we recommend that you follow the Secret management user guide.

Get a gzipped list of clusters

curl -n -H "Accept-Encoding: gzip" https://<databricks-instance>/api/2.0/clusters/list > clusters.gz

Upload a big file into DBFS

The amount of data uploaded by single API call cannot exceed 1MB. To upload a file that is larger than 1MB to DBFS, use the streaming API, which is a combination of create, addBlock, and close.

Here is an example of how to perform this action using Python.

import json
import requests
import base64

DOMAIN = '<databricks-instance>'
TOKEN = '<your-token>'
BASE_URL = 'https://%s/api/2.0/dbfs/' % (DOMAIN)

def dbfs_rpc(action, body):
  """ A helper function to make the DBFS API request, request/response is encoded/decoded as JSON """
  response = requests.post(
    BASE_URL + action,
    headers={'Authorization': 'Bearer %s' % TOKEN },
    json=body
  )
  return response.json()

# Create a handle that will be used to add blocks
handle = dbfs_rpc("create", {"path": "/temp/upload_large_file", "overwrite": "true"})['handle']
with open('/a/local/file') as f:
  while True:
    # A block can be at most 1MB
    block = f.read(1 << 20)
    if not block:
        break
    data = base64.standard_b64encode(block)
    dbfs_rpc("add-block", {"handle": handle, "data": data})
# close the handle to finish uploading
dbfs_rpc("close", {"handle": handle})

Create a Python 3 cluster (Databricks Runtime 5.5 LTS)

Note

Python 3 is the default version of Python in Databricks Runtime 6.0 and above.

The following example shows how to launch a Python 3 cluster using the Databricks REST API and the requests Python HTTP library:

import requests

DOMAIN = '<databricks-instance>'
TOKEN = '<your-token>'

response = requests.post(
  'https://%s/api/2.0/clusters/create' % (DOMAIN),
  headers={'Authorization': 'Bearer %s' % TOKEN},
  json={
    "cluster_name": "my-cluster",
    "spark_version": "5.5.x-scala2.11",
    "node_type_id": "i3.xlarge",
    "spark_env_vars": {
      "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
    },
    "num_workers": 25
  }
)

if response.status_code == 200:
  print(response.json()['cluster_id'])
else:
  print("Error launching cluster: %s: %s" % (response.json()["error_code"], response.json()["message"]))

Create a High Concurrency cluster

The following example shows how to launch a High Concurrency mode cluster using the Databricks REST API:

curl -n -X POST -H 'Content-Type: application/json' -d '{
  "cluster_name": "high-concurrency-cluster",
  "spark_version": "7.3.x-scala2.12",
  "node_type_id": "i3.xlarge",
  "spark_conf":{
        "spark.databricks.cluster.profile":"serverless",
        "spark.databricks.repl.allowedLanguages":"sql,python,r"
     },
     "aws_attributes":{
        "zone_id":"us-west-2c",
        "first_on_demand":1,
        "availability":"SPOT_WITH_FALLBACK",
        "spot_bid_price_percent":100
     },
   "custom_tags":{
        "ResourceClass":"Serverless"
     },
       "autoscale":{
        "min_workers":1,
        "max_workers":2
     },
  "autotermination_minutes":10
}' https://<databricks-instance>/api/2.0/clusters/create

Jobs API examples

This section shows how to create Python, spark submit, and JAR jobs and run the JAR job and view its output.

Create a Python job

This example shows how to create a Python job. It uses the Apache Spark Python Spark Pi estimation.

  1. Download the Python file containing the example and upload it to Databricks File System (DBFS) using the Databricks CLI.

    dbfs cp pi.py dbfs:/docs/pi.py
    
  2. Create the job. The following examples demonstrate how to create a job using Databricks Runtime and Databricks Light.

    Databricks Runtime

    curl -n -X POST -H 'Content-Type: application/json' -d \
    '{
      "name": "SparkPi Python job",
      "new_cluster": {
        "spark_version": "7.3.x-scala2.12",
        "node_type_id": "i3.xlarge",
        "num_workers": 2
      },
      "spark_python_task": {
        "python_file": "dbfs:/pi.py",
        "parameters": [
          "10"
        ]
      }
    }' https://<databricks-instance>/api/2.0/jobs/create
    

    Databricks Light

    curl -n -X POST -H 'Content-Type: application/json' -d \
    '{
      "name": "SparkPi Python job",
      "new_cluster": {
        "spark_version": "apache-spark-2.4.x-scala2.11",
        "node_type_id": "i3.xlarge",
        "num_workers": 2
      },
      "spark_python_task": {
        "python_file": "dbfs:/pi.py",
        "parameters": [
          "10"
        ]
      }
     }' https://<databricks-instance>/api/2.0/jobs/create
    

Create a spark-submit job

This example shows how to create a spark-submit job. It uses the Apache Spark SparkPi example.

  1. Download the JAR containing the example and upload the JAR to Databricks File System (DBFS) using the Databricks CLI.

    dbfs cp SparkPi-assembly-0.1.jar dbfs:/docs/sparkpi.jar
    
  2. Create the job.

    curl -n \
    -X POST -H 'Content-Type: application/json' -d \
    '{
         "name": "SparkPi spark-submit job",
         "new_cluster": {
           "spark_version": "7.3.x-scala2.12",
           "node_type_id": "r3.xlarge",
           "aws_attributes": {"availability": "ON_DEMAND"},
           "num_workers": 2
           },
        "spark_submit_task": {
           "parameters": [
             "--class",
             "org.apache.spark.examples.SparkPi",
             "dbfs:/docs/sparkpi.jar",
             "10"
             ]
           }
    }' https://<databricks-instance>/api/2.0/jobs/create
    

Create and run a spark-submit job for R scripts

This example shows how to create a spark-submit job to run R scripts.

  1. Upload the R file to Databricks File System (DBFS) using the Databricks CLI.

    dbfs cp your_code.R dbfs:/path/to/your_code.R
    

    If the code uses SparkR, it must first install the package. Databricks Runtime contains the SparkR source code. Install the SparkR package from its local directory as shown in the following example:

    install.packages("/databricks/spark/R/pkg", repos = NULL)
    library(SparkR)
    
    sparkR.session()
    n <- nrow(createDataFrame(iris))
    write.csv(n, "/dbfs/path/to/num_rows.csv")
    

    Databricks Runtime installs the latest version of sparklyr from CRAN. If the code uses sparklyr, You must specify the Spark master URL in spark_connect. To form the Spark master URL, use the SPARK_LOCAL_IP environment variable to get the IP, and use the default port 7077. For example:

    library(sparklyr)
    
    master <- paste("spark://", Sys.getenv("SPARK_LOCAL_IP"), ":7077", sep="")
    sc <- spark_connect(master)
    iris_tbl <- copy_to(sc, iris)
    write.csv(iris_tbl, "/dbfs/path/to/sparklyr_iris.csv")
    
  2. Create the job.

    curl -n \
    -X POST -H 'Content-Type: application/json' \
    -d '{
         "name": "R script spark-submit job",
         "new_cluster": {
           "spark_version": "7.3.x-scala2.12",
           "node_type_id": "i3.xlarge",
           "aws_attributes": {"availability": "SPOT"},
           "num_workers": 2
           },
        "spark_submit_task": {
           "parameters": [ "dbfs:/path/to/your_code.R" ]
           }
    }' https://<databricks-instance>/api/2.0/jobs/create
    

    This returns a job-id that you can then use to run the job.

  3. Run the job using the job-id.

    curl -n \
    -X POST -H 'Content-Type: application/json' \
    -d '{ "job_id": <job-id> }' https://<databricks-instance>/api/2.0/jobs/run-now
    

Create and run a JAR job

This example shows how to create and run a JAR job. It uses the Apache Spark SparkPi example.

  1. Download the JAR containing the example.

  2. Upload the JAR to your Databricks instance using the API:

    curl -n \
    -F filedata=@"SparkPi-assembly-0.1.jar" \
    -F path="/docs/sparkpi.jar" \
    -F overwrite=true \
    https://<databricks-instance>/api/2.0/dbfs/put
    

    A successful call returns {}. Otherwise you will see an error message.

  3. Get a list of all Spark versions prior to creating your job.

    curl -n https://<databricks-instance>/api/2.0/clusters/spark-versions
    

    This example uses 7.3.x-scala2.12. See Runtime version strings for more information about Spark cluster versions.

  4. Create the job. The JAR is specified as a library and the main class name is referenced in the Spark JAR task.

    curl -n -X POST -H 'Content-Type: application/json' \
    -d '{
          "name": "SparkPi JAR job",
          "new_cluster": {
            "spark_version": "7.3.x-scala2.12",
            "node_type_id": "r3.xlarge",
            "aws_attributes": {"availability": "ON_DEMAND"},
            "num_workers": 2
            },
         "libraries": [{"jar": "dbfs:/docs/sparkpi.jar"}],
         "spark_jar_task": {
            "main_class_name":"org.apache.spark.examples.SparkPi",
            "parameters": "10"
            }
    }' https://<databricks-instance>/api/2.0/jobs/create
    

    This returns a job-id that you can then use to run the job.

  5. Run the job using run now:

    curl -n \
    -X POST -H 'Content-Type: application/json' \
    -d '{ "job_id": <job-id> }' https://<databricks-instance>/api/2.0/jobs/run-now
    
  6. Navigate to https://<databricks-instance>/#job/<job-id> and you’ll be able to see your job running.

  7. You can also check on it from the API using the information returned from the previous request.

    curl -n https://<databricks-instance>/api/2.0/jobs/runs/get?run_id=<run-id> | jq
    

    Which should return something like:

    {
      "job_id": 35,
      "run_id": 30,
      "number_in_job": 1,
      "original_attempt_run_id": 30,
      "state": {
        "life_cycle_state": "TERMINATED",
        "result_state": "SUCCESS",
        "state_message": ""
      },
      "task": {
        "spark_jar_task": {
          "jar_uri": "",
          "main_class_name": "org.apache.spark.examples.SparkPi",
          "parameters": [
            "10"
          ],
          "run_as_repl": true
        }
      },
      "cluster_spec": {
        "new_cluster": {
          "spark_version": "7.3.x-scala2.12",
          "node_type_id": "<node-type>",
          "enable_elastic_disk": false,
          "num_workers": 1
        },
        "libraries": [
          {
            "jar": "dbfs:/docs/sparkpi.jar"
          }
        ]
      },
      "cluster_instance": {
        "cluster_id": "0412-165350-type465",
        "spark_context_id": "5998195893958609953"
      },
      "start_time": 1523552029282,
      "setup_duration": 211000,
      "execution_duration": 33000,
      "cleanup_duration": 2000,
      "trigger": "ONE_TIME",
      "creator_user_name": "...",
      "run_name": "SparkPi JAR job",
      "run_page_url": "<databricks-instance>/?o=3901135158661429#job/35/run/1",
      "run_type": "JOB_RUN"
    }
    
  8. To view the job output, visit the job run details page.

    Executing command, time = 1523552263909.
    Pi is roughly 3.13973913973914
    

Create cluster enabled for table access control example

To create a cluster enabled for table access control, specify the following spark_conf property in your request body:

curl -X POST https://<databricks-instance>/api/2.0/clusters/create -d'
{
  "cluster_name": "my-cluster",
  "spark_version": "7.3.x-scala2.12",
  "node_type_id": "i3.xlarge",
  "spark_conf": {
    "spark.databricks.acl.dfAclsEnabled":true,
    "spark.databricks.repl.allowedLanguages": "python,sql"
  },
  "aws_attributes": {
    "availability": "SPOT",
    "zone_id": "us-west-2a"
  },
  "num_workers": 1,
  "custom_tags":{
     "costcenter":"Tags",
     "applicationname":"Tags1"
  }
}'

Cluster log delivery examples

While you can view the Spark driver and executor logs in the Spark UI, Databricks can also deliver the logs to DBFS and S3 destinations. See the following examples.

Create a cluster with logs delivered to a DBFS location

The following cURL command creates a cluster named cluster_log_dbfs and requests Databricks to sends its logs to dbfs:/logs with the cluster ID as the path prefix.

curl -n -X POST -H 'Content-Type: application/json' -d \
'{
  "cluster_name": "cluster_log_dbfs",
  "spark_version": "7.3.x-scala2.12",
  "node_type_id": "i3.xlarge",
  "num_workers": 1,
  "cluster_log_conf": {
    "dbfs": {
      "destination": "dbfs:/logs"
    }
  }
}' https://<databricks-instance>/api/2.0/clusters/create

The response should contain the cluster ID:

{"cluster_id":"1111-223344-abc55"}

After cluster creation, Databricks syncs log files to the destination every 5 minutes. It uploads driver logs to dbfs:/logs/1111-223344-abc55/driver and executor logs to dbfs:/logs/1111-223344-abc55/executor.

Create a cluster with logs delivered to an S3 location

Databricks supports delivering logs to an S3 location using cluster instance profiles. The following command creates a cluster named cluster_log_s3 and requests Databricks to send its logs to s3://my-bucket/logs using the specified instance profile.

curl -n -X POST --H 'Content-Type: application/json' d \
'{
  "cluster_name": "cluster_log_s3",
  "spark_version": "7.3.x-scala2.12",
  "aws_attributes": {
     "availability": "SPOT",
     "zone_id": "us-west-2c",
     "instance_profile_arn": "arn:aws:iam::12345678901234:instance-profile/YOURIAM"
    },
    "num_workers": 1,
    "cluster_log_conf": {
     "s3": {
       "destination": "s3://my-bucket/logs",
    "region": "us-west-2"
    }
  }
}' https://<databricks-instance>/api/2.0/clusters/create

Databricks delivers the logs to the S3 destination using the corresponding instance profile. Databricks supports encryption with both Amazon S3-Managed Keys (SSE-S3) and AWS KMS-Managed Keys (SSE-KMS). See Encrypt data in S3 buckets for details.

Important

You should make sure the IAM role for the instance profile has permission to upload logs to the S3 destination and read them after. Otherwise, by default only the AWS account owner of the S3 bucket can access the logs. Use canned_acl in the API request to change the default permission.

Check log delivery status

You can retrieve cluster information with log delivery status via API:

curl -n -H 'Content-Type: application/json' -d \
'{
  "cluster_id": "1111-223344-abc55"
}' https://<databricks-instance>/api/2.0/clusters/get

If the latest batch of log upload was successful, the response should contain only the timestamp of the last attempt:

{
  "cluster_log_status": {
    "last_attempted": 1479338561
  }
}

In case of errors, the error message would appear in the response:

{
  "cluster_log_status": {
    "last_attempted": 1479338561,
    "last_exception": "Exception: Access Denied ..."
  }
}

Workspace examples

Here are some examples for using the Workspace API to list, get info about, create, delete, export, and import workspace objects.

List a notebook or a folder

The following cURL command lists a path in the workspace.

curl -n -X GET -H 'Content-Type: application/json' -d \
'{
  "path": "/Users/user@example.com/"
}' https://<databricks-instance>/api/2.0/workspace/list

The response should contain a list of statuses:

{
  "objects": [
    {
     "object_type": "DIRECTORY",
     "path": "/Users/user@example.com/folder"
    },
    {
     "object_type": "NOTEBOOK",
     "language": "PYTHON",
     "path": "/Users/user@example.com/notebook1"
    },
    {
     "object_type": "NOTEBOOK",
     "language": "SCALA",
     "path": "/Users/user@example.com/notebook2"
    }
  ]
}

If the path is a notebook, the response contains an array containing the status of the input notebook.

Get information about a notebook or a folder

The following cURL command gets the status of a path in the workspace.

curl -n  -X GET -H 'Content-Type: application/json' -d \
'{
  "path": "/Users/user@example.com/"
}' https://<databricks-instance>/api/2.0/workspace/get-status

The response should contain the status of the input path:

{
  "object_type": "DIRECTORY",
  "path": "/Users/user@example.com"
}

Create a folder

The following cURL command creates a folder. It creates the folder recursively like mkdir -p. If the folder already exists, it will do nothing and succeed.

curl -n -X POST -H 'Content-Type: application/json' -d \
'{
  "path": "/Users/user@example.com/new/folder"
}' https://<databricks-instance>/api/2.0/workspace/mkdirs

If the request succeeds, an empty JSON string will be returned.

Delete a notebook or folder

The following cURL command deletes a notebook or folder. You can enable recursive to recursively delete a non-empty folder.

curl -n -X POST -H 'Content-Type: application/json' -d \
'{
  "path": "/Users/user@example.com/new/folder",
  "recursive": "false"
}' https://<databricks-instance>/api/2.0/workspace/delete

If the request succeeds, an empty JSON string is returned.

Export a notebook or folder

The following cURL command exports a notebook. Notebooks can be exported in the following formats: SOURCE, HTML, JUPYTER, DBC. A folder can be exported only as DBC.

curl -n  -X GET -H 'Content-Type: application/json' \
'{
  "path": "/Users/user@example.com/notebook",
  "format": "SOURCE"
}' https://<databricks-instance>/api/2.0/workspace/export

The response contains base64 encoded notebook content.

{
  "content": "Ly8gRGF0YWJyaWNrcyBub3RlYm9vayBzb3VyY2UKcHJpbnQoImhlbGxvLCB3b3JsZCIpCgovLyBDT01NQU5EIC0tLS0tLS0tLS0KCg=="
}

Alternatively, you can download the exported notebook directly.

curl -n -X GET "https://<databricks-instance>/api/2.0/workspace/export?format=SOURCE&direct_download=true&path=/Users/user@example.com/notebook"

The response will be the exported notebook content.

Import a notebook or directory

The following cURL command imports a notebook in the workspace. Multiple formats (SOURCE, HTML, JUPYTER, DBC) are supported. If the format is SOURCE, you must specify language. The content parameter contains base64 encoded notebook content. You can enable overwrite to overwrite the existing notebook.

curl -n -X POST -H 'Content-Type: application/json' -d \
'{
  "path": "/Users/user@example.com/new-notebook",
  "format": "SOURCE",
  "language": "SCALA",
  "content": "Ly8gRGF0YWJyaWNrcyBub3RlYm9vayBzb3VyY2UKcHJpbnQoImhlbGxvLCB3b3JsZCIpCgovLyBDT01NQU5EIC0tLS0tLS0tLS0KCg==",
  "overwrite": "false"
}' https://<databricks-instance>/api/2.0/workspace/import

If the request succeeds, an empty JSON string is returned.

Alternatively, you can import a notebook via multipart form post.

curl -n -X POST https://<databricks-instance>/api/2.0/workspace/import \
       -F path="/Users/user@example.com/new-notebook" -F format=SOURCE -F language=SCALA -F overwrite=true -F content=@notebook.scala