DBFS API

The Databricks Filesystem is a particular API within Databricks that makes it simple to interact with different data sources without having to include your credentials everytime you read a file. See Databricks File System - DBFS for more information. For an easy to use command line client of the DBFS API, please reference Databricks CLI.


Add Block

Endpoint HTTP Method
2.0/dbfs/add-block POST

Appends a block of data to the stream specified by the input handle. If the handle does not exist, this call will throw an exception with RESOURCE_DOES_NOT_EXIST. If the block of data exceeds 1 MB, this call will throw an exception with MAX_BLOCK_SIZE_EXCEEDED. Example of request:

{
  "data": "ZGF0YWJyaWNrcwo=",
  "handle": 7904256
}

Request Structure

Field Name Type Description
handle INT64 The handle on an open stream. This field is required.
data BYTES The base64-encoded data to append to the stream. This has a limit of 1 MB. This field is required.

Close

Endpoint HTTP Method
2.0/dbfs/close POST

Closes the stream specified by the input handle. If the handle does not exist, this call will throw an exception with RESOURCE_DOES_NOT_EXIST.

Request Structure

Field Name Type Description
handle INT64 The handle on an open stream. This field is required.

Create

Endpoint HTTP Method
2.0/dbfs/create POST

Opens a stream to write to a file and returns a handle to this stream. There is a 10 minute idle timeout on this handle. If a file or directory already exists on the given path and overwrite is set to false, this call will throw an exception with RESOURCE_ALREADY_EXISTS. A typical workflow for file upload would be:

  1. Issue a create call and get a handle.
  2. Issue one or more add-block calls with the handle you have.
  3. Issue a close call with the handle you have.

Request Structure

Field Name Type Description
path STRING The path of the new file. The path should be the absolute DBFS path (e.g. “/mnt/foo.txt”). This field is required.
overwrite BOOL The flag that specifies whether to overwrite existing file/files.

Response Structure

Field Name Type Description
handle INT64 Handle which should subsequently be passed into the AddBlock and Close calls when writing to a file through a stream.

Delete

Endpoint HTTP Method
2.0/dbfs/delete POST

Delete the file or directory (optionally recursively delete all files in the directory). This call will throw an exception with IO_ERROR if the path is a non-empty directory and recursive is set to false or on other similar errors.

Request Structure

Field Name Type Description
path STRING The path of the file or directory to delete. The path should be the absolute DBFS path (e.g. “/mnt/foo/”). This field is required.
recursive BOOL Whether or not to recursively delete the directory’s contents. Deleting empty directories can be done without providing the recursive flag.

Get Status

Endpoint HTTP Method
2.0/dbfs/get-status GET

Gets the file information of a file or directory. If the file or directory does not exist, this call will throw an exception with RESOURCE_DOES_NOT_EXIST.

Request Structure

Field Name Type Description
path STRING The path of the file or directory. The path should be the absolute DBFS path (e.g. “/mnt/foo/”). This field is required.

Response Structure

Field Name Type Description
path STRING The path of the file or directory.
is_dir BOOL True if the path is a directory.
file_size INT64 The length of the file in bytes or zero if the path is a directory.

List

Endpoint HTTP Method
2.0/dbfs/list GET

Lists the contents of a directory, or details of the file. If the file or directory does not exist, this call will throw an exception with RESOURCE_DOES_NOT_EXIST. Example of reply:

{
  "files": [
    {
      "path": "/a.cpp",
      "is_dir": false,
      "file_size": 261
    },
    {
      "path": "/databricks-results",
      "is_dir": true,
      "file_size": 0
    }
  ]
}

Request Structure

Field Name Type Description
path STRING The path of the file or directory. The path should be the absolute DBFS path (e.g. “/mnt/foo/”). This field is required.

Response Structure

Field Name Type Description
files An array of FileInfo A list of FileInfo’s that describe contents of directory or file. See example above.

Mkdirs

Endpoint HTTP Method
2.0/dbfs/mkdirs POST

Creates the given directory and necessary parent directories if they do not exist. If there exists a file (not a directory) at any prefix of the input path, this call will throw an exception with RESOURCE_ALREADY_EXISTS. Note that if this operation fails it may have succeeded in creating some of the necessary parent directories.

Request Structure

Field Name Type Description
path STRING The path of the new directory. The path should be the absolute DBFS path (e.g. “/mnt/foo/”). This field is required.

Move

Endpoint HTTP Method
2.0/dbfs/move POST

Move a file from one location to another location within DBFS. If the source file does not exist, this call will throw an exception with RESOURCE_DOES_NOT_EXIST. If there already exists a file in the destination path, this call will throw an exception with RESOURCE_ALREADY_EXISTS. If the given source path is a directory, this call will always recursively move all files.

Request Structure

Field Name Type Description
source_path STRING The source path of the file or directory. The path should be the absolute DBFS path (e.g. “/mnt/foo/”). This field is required.
destination_path STRING The destination path of the file or directory. The path should be the absolute DBFS path (e.g. “/mnt/bar/”). This field is required.

Put

Endpoint HTTP Method
2.0/dbfs/put POST

Uploads a file through the use of multipart form post. It is mainly used for streaming uploads, but can also be used as a convenient single call for data upload. Example usage:

In the following examples, replace YOUR_DOMAIN with the <ACCOUNT>.cloud.databricks.com domain name of your Databricks deployment.

curl -u USER:PASS -F contents=@localsrc -F path="PATH"
        https://YOUR_DOMAIN/api/2.0/dbfs/put

Please note that localsrc is the path to a local file to upload and this usage is only supported with multipart form post (i.e. using -F or –form with curl).

Alternatively you can pass contents as base64 string. Examples:

curl -u USER:PASS -F contents="BASE64" -F path="PATH"
        https://YOUR_DOMAIN/api/2.0/dbfs/put
curl -u USER:PASS -H "Content-Type: application/json"
-d '{"path":"PATH","contents":"BASE64"}' https://YOUR_DOMAIN/api/2.0/dbfs/put``

Amount of data that can be passed using contents (i.e. not streaming) parameter is limited to 1 MB, MAX_BLOCK_SIZE_EXCEEDED will be thrown if exceeded. Please use streaming upload if you want to upload large files, see Create, Add Block and Close for details.

Request Structure

Field Name Type Description
path STRING The path of the new file. The path should be the absolute DBFS path (e.g. “/mnt/foo/”). This field is required.
contents BYTES This parameter might be absent, and instead a posted file will be used.
overwrite BOOL The flag that specifies whether to overwrite existing file/files.

Read

Endpoint HTTP Method
2.0/dbfs/read GET

Returns the contents of a file. If the file does not exist, this call will throw an exception with RESOURCE_DOES_NOT_EXIST. If the path is a directory, the read length is negative, or if the offset is negative, this call will throw an exception with INVALID_PARAMETER_VALUE. If the read length exceeds 1 MB, this call will throw an exception with MAX_READ_SIZE_EXCEEDED. If offset + length exceeds the number of bytes in a file, we will read contents until the end of file.

Request Structure

Field Name Type Description
path STRING The path of the file to read. The path should be the absolute DBFS path (e.g. “/mnt/foo/”). This field is required.
offset INT64 The offset to read from in bytes.
length INT64 The number of bytes to read starting from the offset. This has a limit of 1 MB, and a default value of 0.5 MB.

Response Structure

Field Name Type Description
bytes_read INT64 The number of bytes read (could be less than length if we hit end of file). This refers to number of bytes read in unencoded version (response data is base64-encoded).
data BYTES The base64-encoded contents of the file read.

Data Structures

FileInfo

Stores the attributes of a file or directory.

Field Name Type Description
path STRING The path of the file or directory.
is_dir BOOL True if the path is a directory.
file_size INT64 The length of the file in bytes or zero if the path is a directory.