REST API V1.2

The Databricks REST API allows you to programmatically access Databricks instead of going through our Web UI.

This document covers REST API 1.2.

If you want to use create and manage jobs you will have to use our REST API V2.0. However, the 2.0 API does not yet include all of the functionality of version 1.2 and therefore the 1.2 version is not yet deprecated.

REST API Use Cases

  • Create a hook in Jenkins to replace an old version of your library jar with the latest version.
  • Start Spark jobs triggered from your existing production systems or from systems like Airflow and Luigi.
  • Programmatically bring up a cluster of a certain size at a fixed time of day and then shut it down at night.
  • and more...

Terminology

  • Cluster management: creating new clusters, and describing existing clusters
  • Execution contexts: creating unique variable namespaces where Spark commands can be called
  • Command execution: executing commands within a specific execution context
  • Libraries: allows uploading third-party libraries that can be used in the submitted commands

Details

  • This REST API runs over HTTPS.
  • For retrieving information, HTTP GET is used.
  • For state modification, HTTP POST is used.
  • For file upload, multipart/form-data is used.
  • Otherwise application/x-www-form-urlencoded is used.
  • The response content type is json.
  • Starting from version 1.1, basic authentication is used to authenticate the user for every API call.
  • User’s credentials are base64 encoded and are in the HTTP header for every API call.
  • for example “Authorization: Basic YWRtaW46YWRtaW4=”.

Getting Started

Test Your Connection

telnet YOUR_INSTANCE_NAME.cloud.databricks.com 443

Trying 52.11.163.202...
Connected to ec2-52-11-163-202.us-west-2.compute.amazonaws.com.
Escape character is '^]'.

nc -v -z YOUR_INSTANCE_NAME.cloud.databricks.com 443
found 1 connections:
     1: flags=82<CONNECTED,PREFERRED>
    outif utun0
    src x.x.x.x port 59063
    dst y.y.y.y port 443
    rank info not available
    TCP aux info available

Connection to YOUR_INSTANCE_NAME.cloud.databricks.com port 443 [TCP/HTTPS] succeeded!

You can use either tool above to test the connection. Port 443 is default HTTPS port and users can run the REST API on this port.

curl -u user:password https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/clusters/list

If you cannot connect to port 443 please reach out to support@databricks.com with your account URL.

Sample REST API calls

We cover some sample curl commands below, but you can also use an HTTP library in your programming language of choice.

GET request

curl -u username:password https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/clusters/list

Note: If your URL has the “&” character in it when you have more than one command, you’ll need to quote that URL so UNIX won’t interpret that as a command separator: .. code:

curl -u username:password 'https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/commands/status?clusterId=batVenom&contextId=35585555555555&commandId=45382422555555555'

POST Request with multipart/form-data for file uploads

curl -X POST -u username:password https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/libraries/upload \
  -F language=scala -F clusterId=batVenom -F name=YOUR_LIB_NAME -F uri=@/Users/???/something.jar

POST Request with application/x-www-form-urlencoded

curl -X POST -u username:password https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/contexts/create \
  -d "language=scala&clusterId=batVenom"

List of API calls

Cluster Management

Warning

Use the REST API V2.0 to perform cluster management tasks.

  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/clusters/list – lists all spark clusters, including id, name, state
    • GET Request.
      • No arguments.
      • The response cluster status is one of “Pending”, “Running”, “Reconfiguring”, “Terminating”, “Terminated”, or “Error”.
  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/clusters/status – retrieves information about a single spark cluster
    • Get Request.
      • No arguments.
  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/clusters/restart – restart a spark cluster
    • POST Request with application/x-www-form-urlencoded:
      • data = {“clusterId”: “peaceJam”}
  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/clusters/delete – delete a spark cluster
    • POST Request with application/x-www-form-urlencoded:
      • data = {“clusterId”: “peaceJam”}

Execution Context

  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/contexts/create – creates an execution context on a specified cluster for a given programming language
    • POST Request with application/x-www-form-urlencoded:
      • data = {“language”: “scala”, “clusterId”: “peaceJam”}
  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/contexts/status – shows the status of an existing execution context
    • GET request:
      • Example arguments: clusterId=peaceJam&contextId=179365396413324
      • Status is one of [“Pending”, “Running”, “Error”]
  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/contexts/destroy – destroys an execution context
    • POST Request with application/x-www-form-urlencoded:
      • data = {“contextId” : “1793653964133248955”, “clusterId” : “peaceJam”}

Command Execution

Known limitations: command execution does not support %run.

  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/commands/execute – runs a given command or file.
    • POST Request with application/x-www-form-urlencoded:
      • data = {“language”: “scala”, “clusterId”: “peaceJam”, “contextId” : “5456852751451433082”, “command”: “sc.parallelize(1 to 10).collect”}
    • POST Request with multipart/form-data:
      • data = {“language”: “python”, “clusterId”: “peaceJam”, “contextId” : “5456852751451433082”}
      • files = {“command”: ”./myfile.py”}
  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/commands/status – shows one command’s status or result
    • GET Request
      • Example arguments: clusterId=peaceJam&contextId=5456852751451433082&commandId=5220029674192230006
      • status can be [“Queued”, “Running”, “Cancelling”, “Finished”, “Cancelled”, “Error”]
  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/commands/cancel – cancels one command
    • POST Request with application/x-www-form-urlencoded:
      • data = {“clusterId”: “peaceJam”, “contextId” : “5456852751451433082”, “commandId” : “2245426871786618466”}

Library Upload

Note: the library API is more experimental than the rest of the API and could suffer bigger changes in the future versions of the API. The behavior is undefined if two libraries containing the same class file are added.

  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/libraries/list – shows all uploaded libraries
    • GET Request
      • No arguments.
  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/libraries/status – shows library statuses
    • GET Request
      • Example arguments: libraryId=1234
  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/libraries/upload – uploads a java-jar, python-egg or python pypi library file
    • POST Request with multipart/form-data
      • data = {“libType”: “python-egg”, “name” : “mylib.egg”, “folder”: “/mylibraries”, “attachToAllClusters”: “false”}
      • libType can be [“java-jar”, “python-pypi”, “python-egg”]
      • files = {“uri”: ”./spark/python/test_support/userlib-0.1-py2.7.egg”}
  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/libraries/delete – delete a library
    • POST Request with application/x-www-form-urlencoded:
      • data = {“libraryId”: “1234”}
  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/libraries/attach – schedule an action to attach an uploaded library to a cluster or all clusters
    • POST Request with application/x-www-form-urlencoded:
      • data = {“libraryId”: “1234”, “clusterId” : “0223-221237-tract”}
      • Use “__ALL_CLUSTERS” to specify every cluster.
  • https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/libraries/detach – schedule an action to detach a library from a cluster or all clusters
    • POST Request with application/x-www-form-urlencoded:
      • data = {“libraryId”: “1234”, “clusterId” : “0223-221237-tract”}
      • Use “__ALL_CLUSTERS” to specify every cluster.

Example: Upload Spark Jar and Run in Databricks

  • List the Spark Clusters in Databricks.
curl -u username:password https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/clusters/list
[
{
    "driverIp": "10.0.236.4",
    "id": "batVenom",
    "jdbcPort": 10000,
    "name": "Mini-Cluster",
    "numWorkers": 2,
    "status": "Running"
}
]
  • Upload your local jar and attach to all clusters. (You could also upload it and only attach it to specific clusters.)
curl -X POST -u username:password  https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/libraries/upload \
-F language=scala -F clusterId=batVenom -F name=MY_JAR_HASH_9sd8fsdf -F uri=@/Users/me/something.jar -F attachToAllClusters=true
{
"id":"1333132"
}
  • Create an execution context.
curl -X POST -u username:password  https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/contexts/create \
-d "language=scala&clusterId=batVenom"
{
"id": "3558513128163162828"
}
  • Execute a command that uses your jar.
curl -X POST -u username:password https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/commands/execute \
-d 'language=scala&clusterId=batVenom&contextId=3558513128163162828&command=println(com.databricks.apps.logs.chapter1.LogAnalyzer.processLogFile(sc,null,"dbfs:/somefile.log"))'
{
"id": "4538242203822083978"
}
  • Check on the status of your command. It may not return immediately if you are running a lengthy Spark job.
curl -u username:password 'https://YOUR_INSTANCE_NAME.cloud.databricks.com/api/1.2/commands/status?clusterId=batVenom&contextId=3558513128163162828&commandId=4538242203822083978'
{
"id": "4538242203822083978",
"results": {
    "data": "Content Size Avg: 1234, Min: 1234, Max: 1234",
    "resultType": "text"
},
"status": "Finished"
}