Manage files in volumes with the Databricks JDBC Driver (OSS)
Databricks offers bulk ingestion capabilities using Unity Catalog volumes, which allows users to transfer datasets to and from local files like CSV files. See What are Unity Catalog volumes?.
This article describes how to manage files in volumes, as well as read and write streams to and from volumes, using the Databricks JDBC Driver (OSS).
Enable volume operations
To enable Unity Catalog volume operations, set the connection property VolumeOperationAllowedLocalPaths
to a comma separated list of allowed local paths for the volume operations. See Other feature properties
Unity Catalog must be enabled to use this feature. Similar functionality is available using the Databricks UI. See Upload files to a Unity Catalog volume.
The Unity Catalog ingestion commands are SQL statements. The examples below demonstrate PUT, GET, and REMOVE operations.
Upload a local file
To upload a local file /tmp/test.csv
into a Unity Catalog volume path as /Volumes/main/default/e2etests/file1.csv
, use the PUT operation:
PUT '/tmp/test.csv' INTO '/Volumes/main/default/e2etests/file1.csv' OVERWRITE
Download a file
To download a file from the Unity Catalog volume path /Volumes/main/default/e2etests/file1.csv
into a local file /tmp/test.csv
, use the GET operation:
GET '/Volumes/main/default/e2etests/file1.csv' TO '/tmp/test.csv'
Delete a file
To delete a file with a Unity Catalog volume path /Volumes/main/default/e2etests/file1.csv
, use the REMOVE operation:
REMOVE '/Volumes/main/default/e2etests/file1.csv'
Read/write data using a stream
The JDBC driver supports streaming to read and write data from and to Unity Catalog volumes by providing the interface IDatabricksVolumeClient
. See IDatabricksVolumeClient reference for available APIs.
The IDatabricksVolumeClient
can be initialized using the DatabricksVolumeClientFactory
factory utility:
import com.databricks.jdbc.api.impl.volume.DatabricksVolumeClientFactory;
import com.databricks.jdbc.api.volume.IDatabricksVolumeClient;
IDatabricksVolumeClient volumeClient = DatabricksVolumeClientFactory.getVolumeClient(Connection conn);
Write a file into a volume from a stream
Connection connection = DriverManager.getConnection(url, prop);
File file = new File("/tmp/test.csv");
FileInputStream fileInputStream = new FileInputStream(file);
// Upload the file stream to UC Volume path
IDatabricksVolumeClient volumeClient = DatabricksVolumeClientFactory.getVolumeClient(Connection conn);
volumeClient.putObject(catalog, schema, volume, objectPath, inputStream, contentLength, true /* overwrite */);
Read a volume file as a stream
import org.apache.http.entity.InputStreamEntity;
Connection connection = DriverManager.getConnection(url, prop);
File file = new File("/tmp/test.csv");
FileInputStream fileInputStream = new FileInputStream(file);
// Upload the file stream to UC Volume path
IDatabricksVolumeClient volumeClient = DatabricksVolumeClientFactory.getVolumeClient(Connection conn);
InputStreamEntity volumeDataStream = volumeClient.getObject(catalog, schema, volume, objectPath);