Skip to main content

Manage files in volumes with the Databricks JDBC Driver (OSS)

Databricks offers bulk ingestion capabilities using Unity Catalog volumes, which allows users to transfer datasets to and from local files like CSV files. See What are Unity Catalog volumes?.

This article describes how to manage files in volumes, as well as read and write streams to and from volumes, using the Databricks JDBC Driver (OSS).

Enable volume operations

To enable Unity Catalog volume operations, set the connection property VolumeOperationAllowedLocalPaths to a comma separated list of allowed local paths for the volume operations. See Other feature properties

Unity Catalog must be enabled to use this feature. Similar functionality is available using the Databricks UI. See Upload files to a Unity Catalog volume.

The Unity Catalog ingestion commands are SQL statements. The examples below demonstrate PUT, GET, and REMOVE operations.

Upload a local file

To upload a local file /tmp/test.csv into a Unity Catalog volume path as /Volumes/main/default/e2etests/file1.csv, use the PUT operation:

Text
  PUT '/tmp/test.csv' INTO '/Volumes/main/default/e2etests/file1.csv' OVERWRITE

Download a file

To download a file from the Unity Catalog volume path /Volumes/main/default/e2etests/file1.csv into a local file /tmp/test.csv, use the GET operation:

Text
  GET '/Volumes/main/default/e2etests/file1.csv' TO '/tmp/test.csv'

Delete a file

To delete a file with a Unity Catalog volume path /Volumes/main/default/e2etests/file1.csv, use the REMOVE operation:

Text
  REMOVE '/Volumes/main/default/e2etests/file1.csv'

Read/write data using a stream

The JDBC driver supports streaming to read and write data from and to Unity Catalog volumes by providing the interface IDatabricksVolumeClient. See IDatabricksVolumeClient reference for available APIs.

The IDatabricksVolumeClient can be initialized using the DatabricksVolumeClientFactory factory utility:

Java
import com.databricks.jdbc.api.impl.volume.DatabricksVolumeClientFactory;
import com.databricks.jdbc.api.volume.IDatabricksVolumeClient;

IDatabricksVolumeClient volumeClient = DatabricksVolumeClientFactory.getVolumeClient(Connection conn);

Write a file into a volume from a stream

Java
Connection connection = DriverManager.getConnection(url, prop);
File file = new File("/tmp/test.csv");
FileInputStream fileInputStream = new FileInputStream(file);

// Upload the file stream to UC Volume path
IDatabricksVolumeClient volumeClient = DatabricksVolumeClientFactory.getVolumeClient(Connection conn);
volumeClient.putObject(catalog, schema, volume, objectPath, inputStream, contentLength, true /* overwrite */);

Read a volume file as a stream

Java
import org.apache.http.entity.InputStreamEntity;

Connection connection = DriverManager.getConnection(url, prop);
File file = new File("/tmp/test.csv");
FileInputStream fileInputStream = new FileInputStream(file);

// Upload the file stream to UC Volume path
IDatabricksVolumeClient volumeClient = DatabricksVolumeClientFactory.getVolumeClient(Connection conn);
InputStreamEntity volumeDataStream = volumeClient.getObject(catalog, schema, volume, objectPath);

IDatabricksVolumeClient interface

prefixExists

boolean prefixExists(String catalog, String schema, String volume, String prefix, boolean caseSensitive) throws SQLException

Determines if a specific prefix (folder-like structure) exists in the Unity Catalog volume. The prefix must be a part of the file name.

Parameters:

  • catalog - the catalog name of the cloud storage.
  • schema - the schema name of the cloud storage.
  • volume - the Unity Catalog volume name of the cloud storage.
  • prefix - the prefix to check for existence along with the relative path from the volume as the root directory.
  • caseSensitive - whether the check should be case-sensitive or not.

Returns:

A boolean indicating whether the prefix exists or not.

objectExists

boolean objectExists(String catalog, String schema, String volume, String objectPath, boolean caseSensitive) throws SQLException

Determines if a specific object (file) exists in the Unity Catalog volume. The object must match the file name exactly.

Parameters:

  • catalog - the catalog name of the cloud storage.
  • schema - the schema name of the cloud storage.
  • volume - the Unity Catalog volume name of the cloud storage.
  • objectPath - the path of the object (file) from the volume as the root directory to check for existence within the volume (inside any sub-folder).
  • caseSensitive - a boolean indicating whether the check should be case-sensitive or not.

Returns:

A boolean indicating whether the object exists or not.

volumeExists

boolean volumeExists(String catalog, String schema, String volumeName, boolean caseSensitive) throws SQLException

Determines if a specific volume exists in the given catalog and schema. The volume must match the volume name exactly.

Parameters:

  • catalog - the catalog name of the cloud storage.
  • schema - the schema name of the cloud storage.
  • volumeName - the name of the volume to check for existence.
  • caseSensitive a boolean indicating whether the check should be case sensitive or not.

Returns:

A boolean indicating whether the volume exists or not.

listObjects

List<String> listObjects(String catalog, String schema, String volume, String prefix, boolean caseSensitive) throws SQLException

Returns the list of all filenames in the Unity Catalog volume that start with a specified prefix. The prefix must be a part of the file path from the volume as the root.

Parameters:

  • catalog - the catalog name of the cloud storage.
  • schema - the schema name of the cloud storage.
  • volume - the UC volume name of the cloud storage.
  • prefix - the prefix of the filenames to list. This includes the relative path from the volume as the root directory.
  • caseSensitive - a boolean indicating whether the check should be case-sensitive or not.

Returns:

A list of strings indicating the filenames that start with the specified prefix.

getObject (file)

boolean getObject(String catalog, String schema, String volume, String objectPath, String localPath) throws SQLException

Retrieves an object (file) from the Unity Catalog volume and stores it in the specified local path.

Parameters:

  • catalog - the catalog name of the cloud storage.
  • schema - the schema name of the cloud storage.
  • volume - the UC volume name of the cloud storage.
  • objectPath - the path of the object (file) from the volume as the root directory.
  • localPath - the local path where the retrieved data is to be stored.

Returns:

A boolean value indicating status of the GET operation.

getObject (stream)

InputStreamEntity getObject(String catalog, String schema, String volume, String objectPath) throws SQLException

Retrieves an object as an input stream from the Unity Catalog volume.

Parameters:

  • catalog - the catalog name of the cloud storage.
  • schema - the schema name of the cloud storage.
  • volume - the UC volume name of the cloud storage.
  • objectPath - the path of the object (file) from the volume as the root directory.

Returns:

An instance of the input stream entity.

putObject (file)

boolean putObject(String catalog, String schema, String volume, String objectPath, String localPath, boolean toOverwrite) throws SQLException

Uploads data from a local path to a specified path within a Unity Catalog volume.

Parameters:

  • catalog - the catalog name of the cloud storage.
  • schema - the schema name of the cloud storage.
  • volume - the UC volume name of the cloud storage.
  • objectPath the destination path where the object (file) is to be uploaded from the`volume as the root directory.
  • localPath the local path from where the data is to be uploaded.
  • toOverwrite a boolean indicating whether to overwrite the object if it already exists.

Returns:

A boolean value indicating the status of the PUT operation.

putObject (stream)

boolean putObject(String catalog, String schema, String volume, String objectPath, InputStream inputStream, long contentLength, boolean toOverwrite) throws SQLException

Uploads data from an input stream to a specified path within a Unity Catalog volume.

Parameters:

  • catalog - the catalog name of the cloud storage.
  • schema - the schema name of the cloud storage.
  • volume - the UC volume name of the cloud storage.
  • objectPath - the destination path where the object (file) is to be uploaded from the volume as the root directory.
  • inputStream - the input stream from where the data is to be uploaded.
  • contentLength - the length of the input stream.
  • toOverwrite a boolean indicating whether to overwrite the object if it already exists.

Returns:

A boolean value indicating status of the PUT operation.

deleteObject

boolean deleteObject(String catalog, String schema, String volume, String objectPath) throws SQLException

Removes an object from a specified path within a Unity Catalog volume.

Parameters:

  • catalog - the catalog name of the cloud storage.
  • schema - the schema name of the cloud storage.
  • volume - the UC volume name of the cloud storage.
  • objectPath - the path of the object (file) from the volume as the root directory to delete.

Returns:

A boolean value indicating the status of the DELETE operation.