Skip to main content

Manage files in volumes with the Databricks JDBC Driver (OSS)

Databricks offers bulk ingestion capabilities using Unity Catalog volumes, which allows users to transfer datasets to and from local files like CSV files. See What are Unity Catalog volumes?.

This article describes how to manage files in volumes, as well as read and write streams to and from volumes, using the Databricks JDBC Driver (OSS).

Enable volume operations

To enable Unity Catalog volume operations, set the connection property VolumeOperationAllowedLocalPaths to a comma separated list of allowed local paths for the volume operations. See Other feature properties

Unity Catalog must be enabled to use this feature. Similar functionality is available using the Databricks UI. See Upload files to a Unity Catalog volume.

The Unity Catalog ingestion commands are SQL statements. The examples below demonstrate PUT, GET, and REMOVE operations.

Upload a local file

To upload a local file /tmp/test.csv into a Unity Catalog volume path as /Volumes/main/default/e2etests/file1.csv, use the PUT operation:

Text
  PUT '/tmp/test.csv' INTO '/Volumes/main/default/e2etests/file1.csv' OVERWRITE

Download a file

To download a file from the Unity Catalog volume path /Volumes/main/default/e2etests/file1.csv into a local file /tmp/test.csv, use the GET operation:

Text
  GET '/Volumes/main/default/e2etests/file1.csv' TO '/tmp/test.csv'

Delete a file

To delete a file with a Unity Catalog volume path /Volumes/main/default/e2etests/file1.csv, use the REMOVE operation:

Text
  REMOVE '/Volumes/main/default/e2etests/file1.csv'

Read/write data using a stream

The JDBC driver supports streaming to read and write data from and to Unity Catalog volumes by providing the interface IDatabricksVolumeClient. See IDatabricksVolumeClient reference for available APIs.

The IDatabricksVolumeClient can be initialized using the DatabricksVolumeClientFactory factory utility:

Java
import com.databricks.jdbc.api.impl.volume.DatabricksVolumeClientFactory;
import com.databricks.jdbc.api.volume.IDatabricksVolumeClient;

IDatabricksVolumeClient volumeClient = DatabricksVolumeClientFactory.getVolumeClient(Connection conn);

Write a file into a volume from a stream

Java
Connection connection = DriverManager.getConnection(url, prop);
File file = new File("/tmp/test.csv");
FileInputStream fileInputStream = new FileInputStream(file);

// Upload the file stream to UC Volume path
IDatabricksVolumeClient volumeClient = DatabricksVolumeClientFactory.getVolumeClient(Connection conn);
volumeClient.putObject(catalog, schema, volume, objectPath, inputStream, contentLength, true /* overwrite */);

Read a volume file as a stream

Java
import org.apache.http.entity.InputStreamEntity;

Connection connection = DriverManager.getConnection(url, prop);
File file = new File("/tmp/test.csv");
FileInputStream fileInputStream = new FileInputStream(file);

// Upload the file stream to UC Volume path
IDatabricksVolumeClient volumeClient = DatabricksVolumeClientFactory.getVolumeClient(Connection conn);
InputStreamEntity volumeDataStream = volumeClient.getObject(catalog, schema, volume, objectPath);