API Examples

Before beginning with these examples it is important to see the documentation on Authentication. See Authentication. While using cURL we assume that you store Databricks API credentials under .netrc.

Running a Jar Job From the REST API

First we’ll create a fat jar that bundles all of our dependencies together. The package that I will be using in this tutorial is this github repository on the workflow-example branch.

sbt assembly

To confirm that it worked you should see ScalaSparkTemplate-assembly-0.2-SNAPSHOT.jar inside of target/scala-2.10.

Now that we’ve created our jar, let’s use the Databricks REST API TO upload it. Here’s some sample code to do that:

curl -n \
-F filedata=@"target/scala-2.10/ScalaSparkTemplate-assembly-0.2-SNAPSHOT.jar" \
-F path="/docs/tutorial.jar" \
-F overwrite=true \
https://YOURACCOUNT.cloud.databricks.com/api/2.0/dbfs/put

Success will mean that only {} is returned. Otherwise you will see an error message.

Now you may want to get a list of all Spark Versions prior to creating your job. Let’s do this for the API.

curl -n \
https://YOURACCOUNT.cloud.databricks.com/api/2.0/clusters/spark-versions

I’m going to use version 2.0.1-db1-scala2.10 which refers to a specific Databricks release of Spark. See Databricks Runtime Versions for more information about Spark Cluster Versions.

Now we need to create our job using the REST API. You’ll see below that we specified our jar as a library and then just reference the main class name in our JAR task.

curl -n \
-X POST -H 'Content-Type: application/json' \
-d '{
      "name": "my example job",
      "new_cluster": {
        "spark_version": "2.0.1-db1-scala2.10",
        "node_type_id": "r3.xlarge",
        "aws_attributes": {"availability": "ON_DEMAND"},
        "num_workers": 1
        },
     "libraries": [{"jar": "dbfs:/docs/tutorial.jar"}],
     "spark_jar_task": {
        "main_class_name":"databricks.examples.ExampleClass",
        "parameters": "hello there"
        }
}' \
https://YOURACCOUNT.cloud.databricks.com/api/2.0/jobs/create

This will return a “job_id” that you can then use to run the job. The job_id I received was 84005. Now what I’ll do is just perform a “run now” in order to make sure that things are working as expected.

curl -n \
-X POST -H 'Content-Type: application/json' \
-d '{
      "job_id": 84005,
      "jar_params": ["hello", "params", "that are overwritten at run time"]
}' \
https://YOURACCOUNT.cloud.databricks.com/api/2.0/jobs/run-now

Now navigate to https://YOURACCOUNT.cloud.databricks.com/#job/84005 and you’ll be able to see your job run executing and hopefully succeeding!

We can also check on it from the API using the information returned from the previous request.

curl -n https://YOURACCOUNT.cloud.databricks.com/api/2.0/jobs/runs/get?run_id=786200

This will tell you exactly what happened on that job’s run.

Upload big files into DBFS

The amount of data uploaded by single API call can not exceed 1MB, in order to upload a file that is larger than 1MB to DBFS, we have to use the streaming API, that’s a combination of create, addBlock and close.

Here is a example of how to perform this action below using Python.

import json
import base64
import urllib

ACCOUNT = 'YOURACCOUNT'
USER = 'USERNAME'
PASSWORD = 'PASSWORD'
BASE_URL = 'https://%s:%s@%s.cloud.databricks.com/api/2.0/dbfs/' %\
    (urllib.quote(USER), urllib.quote(PASSWORD), ACCOUNT)

def dbfs_rpc(action, body):
    """ A helper function to make the DBFS API request, request/response is encoded/decoded as JSON """
    res = urllib.urlopen(BASE_URL + action, json.dumps(body)).read()
    return json.loads(res)

# Create a handle which will be used to add blocks
handle = dbfs_rpc("create", {"path": "/temp/upload_large_file", "overwrite": "true"})['handle']
with open('/a/local/file') as f:
    while True:
        # A block can be at most 1MB
        block = f.read(1 << 20)
        if not block:
            break
        data = base64.standard_b64encode(block)
        dbfs_rpc("add-block", {"handle": handle, "data": data})
# close the handle to finish uploading
dbfs_rpc("close", {"handle": handle})

Create a Python 3 cluster

The following example Python code shows how to launch a Python 3 cluster using the Databricks REST API and the popular requests Python HTTP library:

import requests

response = requests.post(
  'https://YOURACCOUNT.cloud.databricks.com/api/2.0/clusters/create',
  auth=(USERNAME, PASSWORD),
  json={
    'cluster_name': 'python-3-demo',
    'num_workers': 1,
    'spark_version': '2.0.2-db3',
    'spark_env_vars': {
      'PYSPARK_PYTHON': '/databricks/python3/bin/python3',
    }
  }
)

if response.status_code == 200:
  print(response.json()['cluster_id'])
else:
  print("Error launching cluster: %s: %s" % (response.json()["error_code"], response.json()["message"]))

Cluster Log Delivery Examples

New in version 2.33.

While providing Spark driver and executor logs in the Spark UI, Databricks can also deliver the logs to a DBFS or S3 destination. Currently this feature is only available in the REST API, please check out Cluster API for detailed specification. We provide several examples below.

Create a cluster with logs delivered to a DBFS location

The following cURL command creates a cluster named “cluster_log_dbfs” and requests Databricks to ship its logs to dbfs:/logs with the cluster ID as the prefix.

curl -n -H "Content-Type: application/json" -X POST -d @- https://YOURACCOUNT.cloud.databricks.com/api/2.0/clusters/create <<JSON
{
  "cluster_name": "cluster_log_dbfs",
  "spark_version": "2.0.1-db1-scala2.10",
  "aws_attributes": {
    "availability": "SPOT",
    "zone_id": "us-west-2c"
  },
  "num_workers": 1,
  "cluster_log_conf": {
    "dbfs": {
      "destination": "dbfs:/logs"
    }
  }
}
JSON

The response should contain the cluster ID:

{"cluster_id":"1111-223344-abc55"}

After cluster creation, Databricks syncs log files to the destination every 5 minutes. It uploads driver logs to dbfs:/logs/1111-223344-abc55/driver and executor logs to dbfs:/logs/1111-223344-abc55/executor.

Create a cluster with logs delivered to an S3 location

Databricks also supports delivering logs to an S3 location using cluster IAM roles. The following command creates a cluster named “cluster_log_s3” and requests Databricks to ship its logs to s3://my-bucket/logs using the IAM role associated with the instance profile specified.

curl -n -H "Content-Type: application/json" -X POST -d @- https://YOURACCOUNT.cloud.databricks.com/api/2.0/clusters/create <<JSON
{
  "cluster_name": "cluster_log_s3",
  "spark_version": "2.0.1-db1-scala2.10",
  "aws_attributes": {
    "availability": "SPOT",
    "zone_id": "us-west-2c",
    "instance_profile_arn": "arn:aws:iam::12345678901234:instance-profile/YOURIAM"
  },
  "num_workers": 1,
  "cluster_log_conf": {
    "s3": {
      "destination": "s3://my-bucket/logs",
      "region": "us-west-2"
    }
  }
}
JSON

Databricks delivers the logs to the S3 destination using the corresponding IAM role. We support encryption with both Amazon S3-Managed Keys (SSE-S3) and AWS KMS-Managed Keys (SSE-KMS). Please check Cluster API doc for more details.

Note

Users should make sure the IAM role has permission to upload logs to the S3 destination and read them after. Otherwise, by default only the AWS account owner of the S3 bucket can access the logs. Please use canned_acl in the API request to change the default permission.

Check log delivery status

Users can retrieve cluster information with log delivery status via API:

curl -n -H "Content-Type: application/json" -d @- https://YOURACCOUNT.cloud.databricks.com/api/2.0/clusters/get <<JSON
{
  "cluster_id": "1111-223344-abc55"
}
JSON

If the latest batch of log upload was successful, the response should contain only the timestamp of the last attempt:

{
  "cluster_log_status": {
    "last_attempted": 1479338561
  }
}

In case of errors, the error message would appear in the response:

{
  "cluster_log_status": {
    "last_attempted": 1479338561,
    "last_exception": "AmazonS3Exception: Access Denied ..."
  }
}

Workspace API Examples

New in version 2.37.

Here are some examples for using Workspace API to list/import/export/delete notebooks.

List a notebook or a folder

The following cURL command lists a path in the workspace.

curl -n -H "Content-Type: application/json" -X Get -d @- https://YOURACCOUNT.cloud.databricks.com/api/2.0/workspace/list <<JSON
{
  "path": "/Users/user@example.com/"
}
JSON

The response should contain a list of statuses:

{
  "objects": [
    {
      "object_type": "DIRECTORY",
      "path": "/Users/user@example.com/folder"
    },
    {
      "object_type": "NOTEBOOK",
      "language": "PYTHON",
      "path": "/Users/user@example.com/notebook1"
    },
    {
      "object_type": "NOTEBOOK",
      "language": "SCALA",
      "path": "/Users/user@example.com/notebook2"
    }
  ]
}

If the path is a notebook, the response contains a array containing the status of the input notebook.

Get information of a notebook or a folder

The following cURL command get status of a path in the workspace.

curl -n -H "Content-Type: application/json" -X Get -d @- https://YOURACCOUNT.cloud.databricks.com/api/2.0/workspace/get-status <<JSON
{
  "path": "/Users/user@example.com/"
}
JSON

The response should contain the status of the input path:

{
  "object_type": "DIRECTORY",
  "path": "/Users/user@example.com"
}

Create a folder

The following cURL command creates a folder in the workspace. It creates the folder recursively like mkdir -p. If the folder already exists, it will do nothing and succeed.

curl -n -H "Content-Type: application/json" -X POST -d @- https://YOURACCOUNT.cloud.databricks.com/api/2.0/workspace/mkdirs <<JSON
{
  "path": "/Users/user@example.com/new/folder"
}
JSON

If the request succeeds, an empty json string will be returned.

Delete a notebook or folder

The following cURL command deletes a notebook/folder in the workspace. It deletes notebook or folder. recursive can be enabled to recursively delete a non-empty folder.

curl -n -H "Content-Type: application/json" -X POST -d @- https://YOURACCOUNT.cloud.databricks.com/api/2.0/workspace/delete <<JSON
{
  "path": "/Users/user@example.com/new/folder",
  "recursive": "false"
}
JSON

If the request succeeds, an empty json string will be returned.

Export a notebook or folder

The following cURL command export a notebook in the workspace. Notebook could be exported in different formats. Currently, it supports SOURCE, HTML, JUPYTER, DBC. Please note that a folder can only be exported as DBC.

curl -n -H "Content-Type: application/json" -X GET -d @- https://YOURACCOUNT.cloud.databricks.com/api/2.0/workspace/export <<JSON
{
  "path": "/Users/user@example.com/notebook",
  "format": "SOURCE"
}
JSON

The response contains base64 encoded notebook content.

{
  "content": "Ly8gRGF0YWJyaWNrcyBub3RlYm9vayBzb3VyY2UKcHJpbnQoImhlbGxvLCB3b3JsZCIpCgovLyBDT01NQU5EIC0tLS0tLS0tLS0KCg=="
}

Alternatively, you can download the exported notebook directly.

curl -n -X GET "https://YOURACCOUNT.cloud.databricks.com/api/2.0/workspace/export?format=SOURCE&direct_download=true&path=/Users/user@example.com/notebook"

The response will be the exported notebook content.

Import a notebook or directory

The following cURL command import a notebook in the workspace. Multiple format (SOURCE, HTML, JUPYTER, DBC) are supported. If the format is SOURCE, language needs to be provided. The content parameter contains base64 encoded notebook content. overwrite can be enabled to overwrite the existing notebook.

curl -n -H "Content-Type: application/json" -X POST -d @- https://YOURACCOUNT.cloud.databricks.com/api/2.0/workspace/import <<JSON
{
  "path": "/Users/user@example.com/new-notebook",
  "format": "SOURCE",
  "language": "SCALA",
  "content": "Ly8gRGF0YWJyaWNrcyBub3RlYm9vayBzb3VyY2UKcHJpbnQoImhlbGxvLCB3b3JsZCIpCgovLyBDT01NQU5EIC0tLS0tLS0tLS0KCg==",
  "overwrite": "false"
}
JSON

If the request succeeds, an empty json string will be returned.

Alternatively, you can import notebook via multipart form post.

curl -n -X POST https://YOURACCOUNT.cloud.databricks.com/api/2.0/workspace/import \
     -F path="/Users/user@example.com/new-notebook" -F format=SOURCE -F language=SCALA -F overwrite=true -F content=@notebook.scala

Spark-submit Job API Examples

New in version 2.46.

Currently spark-submit job is only available in the REST API, please check out Job API for detailed specification.

Create a spark submit jar job

The following cURL command create a job to run SparkPi, assuming spark example jar is uploaded to dbfs:/path/to/examples.jar already.

curl -n -H "Content-Type: application/json" -X POST -d @- https://YOURACCOUNT.cloud.databricks.com/api/2.0/jobs/create <<JSON
{
  "name": "my spark submit jar job",
  "new_cluster": {
    "spark_version": "2.1.1-db5-scala2.10",
    "num_workers": 1
  },
  "spark_submit_task": {
    "parameters": [
      "--class",
      "org.apache.spark.examples.SparkPi",
      "dbfs:/path/to/examples.jar"
    ]
  }
}
JSON

Create a spark submit python job

The following cURL command create a job to run python spark Pi estimation, assuming pi.py uploaded to dbfs:/path/to/pi.py already.

curl -n -H "Content-Type: application/json" -X POST -d @- https://YOURACCOUNT.cloud.databricks.com/api/2.0/jobs/create <<JSON
{
  "name": "my spark submit python job",
  "new_cluster": {
    "spark_version": "2.1.1-db5-scala2.10",
    "num_workers": 1
  },
  "spark_submit_task": {
    "parameters": [
      "dbfs:/pi.py"
    ]
  }
}
JSON