REST API 1.2

The Databricks REST API allows you to programmatically access Databricks instead of going through the web UI.

This topic covers REST API 1.2. For most use cases, we recommend using the REST API 2.0. It supports most of the functionality of the 1.2 API, as well as additional functionality.

Important

Use the REST API 2.0 for cluster and library management. Support for the 1.2 cluster management and library management APIs ended on Dec 31, 2017. The 1.2 Execution context and Command execution APIs will continue to be supported.

REST API use cases

  • Create a hook in Jenkins to replace an old version of your library JAR with the latest version.
  • Start Spark jobs triggered from your existing production systems or from workflow systems.
  • Programmatically bring up a cluster of a certain size at a fixed time of day and then shut it down at night.

API categories

  • Cluster management: create new clusters and describe existing clusters.
  • Library management: upload third-party libraries that can be used in the submitted commands.
  • Execution context: create unique variable namespaces where Spark commands can be called.
  • Command execution: run commands within a specific execution context.

Details

  • This REST API runs over HTTPS.
  • For retrieving information, use HTTP GET.
  • For modifying state, use HTTP POST.
  • For file upload, use multipart/form-data. Otherwise application/x-www-form-urlencoded is used.
  • The response content type is JSON.
  • Basic authentication is used to authenticate the user for every API call.
  • User credentials are base64 encoded and are in the HTTP header for every API call. For example, Authorization: Basic YWRtaW46YWRtaW4=.

Get started

In the following examples, replace <databricks-instance> with the <ACCOUNT>.cloud.databricks.com domain name of your Databricks deployment.

Test your connection

> telnet <databricks-instance> 443

Trying 52.11.163.202...
Connected to <databricks-instance>.
Escape character is '^]'.

> nc -v -z <databricks-instance> 443
found 1 connections:
     1: flags=82<CONNECTED,PREFERRED>
    outif utun0
    src x.x.x.x port 59063
    dst y.y.y.y port 443
    rank info not available
    TCP aux info available

Connection to <databricks-instance> port 443 [TCP/HTTPS] succeeded!

You can use either tool above to test the connection. Port 443 is default HTTPS port and you can run the REST API on this port. If you cannot connect to port 443, contact support@databricks.com with your account URL.

Sample API calls

We cover some sample cURL commands below, but you can also use an HTTP library in your programming language of choice.

GET request

curl -n https://<databricks-instance>/api/1.2/clusters/list

Note

If your URL has the & character in it you must quote that URL so UNIX doesn’t interpret it as a command separator:

curl -n 'https://<databricks-instance>/api/1.2/commands/status?clusterId=batVenom&contextId=35585555555555&commandId=45382422555555555'

POST request with multipart/form-data for file uploads

curl -X POST -n https://<databricks-instance>/api/1.2/libraries/upload \
  -F language=scala -F clusterId=batVenom -F name=YOUR_LIB_NAME -F uri=@/Users/???/something.jar

POST request with application/x-www-form-urlencoded

curl -X POST -n https://<databricks-instance>/api/1.2/contexts/create -d "language=scala&clusterId=batVenom"

API endpoints by category

Cluster management

Important

Use the REST API 2.0 for cluster and library management. Support for the 1.2 cluster management and library management APIs ended on Dec 31, 2017. The 1.2 execution context and command execution endpoints will continue to be supported.

  • https://<databricks-instance>/api/1.2/clusters/list – lists all Spark clusters, including id, name, state
    • GET Request
      • No arguments.
      • The response cluster status is one of “Pending”, “Running”, “Reconfiguring”, “Terminating”, “Terminated”, or “Error”.
  • https://<databricks-instance>/api/1.2/clusters/status – retrieves information about a single Spark cluster
    • GET Request
      • No arguments.
  • https://<databricks-instance>/api/1.2/clusters/restart – restarts a Spark cluster
    • POST request with application/x-www-form-urlencoded:
      • data = {"clusterId": "peaceJam"}
  • https://<databricks-instance>/api/1.2/clusters/delete – deletes a Spark cluster
    • POST request with application/x-www-form-urlencoded:
      • data = {"clusterId": "peaceJam"}

Library management

Important

Use the REST API 2.0 for cluster and library management. Support for the 1.2 cluster management and library management APIs ended on Dec 31, 2017. The 1.2 execution context and command execution endpoints will continue to be supported.

  • https://<databricks-instance>/api/1.2/libraries/list – shows all uploaded libraries
    • GET Request
      • No arguments.
  • https://<databricks-instance>/api/1.2/libraries/status – shows library statuses
    • GET Request
      • Example arguments: libraryId=1234
  • https://<databricks-instance>/api/1.2/libraries/upload – uploads a Java JAR, Python egg, or Python PyPI library file
    • POST request with multipart/form-data
      • data = {"libType": "python-egg", "name" : "mylib.egg", "folder": "/mylibraries", "attachToAllClusters": "false"}
      • libType can be ["java-jar", "python-pypi", "python-egg"]
      • files = {"uri": "./spark/python/test_support/userlib-0.1-py2.7.egg"}
  • https://<databricks-instance>/api/1.2/libraries/delete – deletes a library
    • POST request with application/x-www-form-urlencoded:
      • data = {"libraryId": "1234"}
  • https://<databricks-instance>/api/1.2/libraries/attach – attaches an uploaded library to a cluster or all clusters. The behavior is undefined if two libraries containing the same class file are attached to a cluster.
    • POST request with application/x-www-form-urlencoded:
      • data = {"libraryId": "1234", "clusterId" : "0223-221237-tract"}
      • Use "__ALL_CLUSTERS" to specify every cluster.
  • https://<databricks-instance>/api/1.2/libraries/detach – detaches a library from a cluster or all clusters. You must restart your cluster for the library to be removed from the cluster.
    • POST request with application/x-www-form-urlencoded:
      • data = {"libraryId": "1234", "clusterId" : "0223-221237-tract"}
      • Use "__ALL_CLUSTERS" to specify every cluster.

Execution context

  • https://<databricks-instance>/api/1.2/contexts/create – creates an execution context on a specified cluster for a given programming language
    • POST request with application/x-www-form-urlencoded:
      • data = {"language": "scala", "clusterId": "peaceJam"}
  • https://<databricks-instance>/api/1.2/contexts/status – shows the status of an existing execution context
    • GET request:
      • Example arguments: clusterId=peaceJam&contextId=179365396413324
      • Status is one of [“Pending”, “Running”, “Error”]
  • https://<databricks-instance>/api/1.2/contexts/destroy – destroys an execution context
    • POST request with application/x-www-form-urlencoded:
      • data = {"contextId" : "1793653964133248955", "clusterId" : "peaceJam"}

Command execution

Known limitations: command execution does not support %run.

  • https://<databricks-instance>/api/1.2/commands/execute – runs a command or file.
    • POST request with application/x-www-form-urlencoded:
      • data = {"language": "scala", "clusterId": "peaceJam", "contextId" : "5456852751451433082", "command": "sc.parallelize(1 to 10).collect"}
    • POST request with multipart/form-data:
      • data = {"language": "python", "clusterId": "peaceJam", "contextId" : "5456852751451433082"}
      • files = {"command": "./myfile.py"}
  • https://<databricks-instance>/api/1.2/commands/status – shows one command’s status or result
    • GET Request
      • Example arguments: clusterId=peaceJam&contextId=5456852751451433082&commandId=5220029674192230006
      • status can be [“Queued”, “Running”, “Cancelling”, “Finished”, “Cancelled”, “Error”]
  • https://<databricks-instance>/api/1.2/commands/cancel – cancels one command
    • POST request with application/x-www-form-urlencoded:
      • data = {"clusterId": "peaceJam", "contextId" : "5456852751451433082", "commandId" : "2245426871786618466"}

Example: Upload and run a Spark JAR

Upload a JAR

  1. List the Spark clusters.

    curl -n https://<databricks-instance>/api/1.2/clusters/list
    
    {
      "driverIp": "10.0.236.4",
      "id": "batVenom",
      "jdbcPort": 10000,
      "name": "Mini-Cluster",
      "numWorkers": 2,
      "status": "Running"
    }
    
  2. Upload your local JAR and attach to all clusters. (You could also upload it and only attach it to specific clusters.)

    curl -X POST -n  https://<databricks-instance>/api/1.2/libraries/upload \
      -F language=scala -F clusterId=batVenom -F name=MY_JAR_HASH_9sd8fsdf -F uri=@/Users/me/something.jar -F attachToAllClusters=true
    
    {
    "id":"1333132"
    }
    

Run a JAR

  1. Create an execution context.

    curl -X POST -n  https://<databricks-instance>/api/1.2/contexts/create -d "language=scala&clusterId=batVenom"
    
    {
      "id": "3558513128163162828"
    }
    
  2. Execute a command that uses your JAR.

    curl -X POST -n https://<databricks-instance>/api/1.2/commands/execute \
    -d 'language=scala&clusterId=batVenom&contextId=3558513128163162828&command=println(com.databricks.apps.logs.chapter1.LogAnalyzer.processLogFile(sc,null,"dbfs:/somefile.log"))'
    
    {
      "id": "4538242203822083978"
    }
    
  3. Check on the status of your command. It may not return immediately if you are running a lengthy Spark job.

    curl -n 'https://<databricks-instance>/api/1.2/commands/status?clusterId=batVenom&contextId=3558513128163162828&commandId=4538242203822083978'
    
    {
    "id": "4538242203822083978",
    "results": {
        "data": "Content Size Avg: 1234, Min: 1234, Max: 1234",
        "resultType": "text"
    },
    "status": "Finished"
    }