Manage files in volumes with the Databricks JDBC Driver (OSS)
Databricks offers bulk ingestion capabilities using Unity Catalog volumes, which allows users to transfer datasets to and from local files like CSV files. See What are Unity Catalog volumes?.
This article describes how to manage files in volumes, as well as read and write streams to and from volumes, using the Databricks JDBC Driver (OSS).
Enable volume operations
To enable Unity Catalog volume operations, set the connection property VolumeOperationAllowedLocalPaths
to a comma separated list of allowed local paths for the volume operations. See Other feature properties
Unity Catalog must be enabled to use this feature. Similar functionality is available using the Databricks UI. See Upload files to a Unity Catalog volume.
The Unity Catalog ingestion commands are SQL statements. The examples below demonstrate PUT, GET, and REMOVE operations.
Upload a local file
To upload a local file /tmp/test.csv
into a Unity Catalog volume path as /Volumes/main/default/e2etests/file1.csv
, use the PUT operation:
PUT '/tmp/test.csv' INTO '/Volumes/main/default/e2etests/file1.csv' OVERWRITE
Download a file
To download a file from the Unity Catalog volume path /Volumes/main/default/e2etests/file1.csv
into a local file /tmp/test.csv
, use the GET operation:
GET '/Volumes/main/default/e2etests/file1.csv' TO '/tmp/test.csv'
Delete a file
To delete a file with a Unity Catalog volume path /Volumes/main/default/e2etests/file1.csv
, use the REMOVE operation:
REMOVE '/Volumes/main/default/e2etests/file1.csv'
Read/write data using a stream
The JDBC driver supports streaming to read and write data from and to Unity Catalog volumes by providing the interface IDatabricksVolumeClient
. See IDatabricksVolumeClient reference for available APIs.
The IDatabricksVolumeClient
can be initialized using the DatabricksVolumeClientFactory
factory utility:
import com.databricks.jdbc.api.impl.volume.DatabricksVolumeClientFactory;
import com.databricks.jdbc.api.volume.IDatabricksVolumeClient;
IDatabricksVolumeClient volumeClient = DatabricksVolumeClientFactory.getVolumeClient(Connection conn);
Write a file into a volume from a stream
Connection connection = DriverManager.getConnection(url, prop);
File file = new File("/tmp/test.csv");
FileInputStream fileInputStream = new FileInputStream(file);
// Upload the file stream to UC Volume path
IDatabricksVolumeClient volumeClient = DatabricksVolumeClientFactory.getVolumeClient(Connection conn);
volumeClient.putObject(catalog, schema, volume, objectPath, inputStream, contentLength, true /* overwrite */);
Read a volume file as a stream
import org.apache.http.entity.InputStreamEntity;
Connection connection = DriverManager.getConnection(url, prop);
File file = new File("/tmp/test.csv");
FileInputStream fileInputStream = new FileInputStream(file);
// Upload the file stream to UC Volume path
IDatabricksVolumeClient volumeClient = DatabricksVolumeClientFactory.getVolumeClient(Connection conn);
InputStreamEntity volumeDataStream = volumeClient.getObject(catalog, schema, volume, objectPath);
IDatabricksVolumeClient interface
prefixExists |
---|
Determines if a specific prefix (folder-like structure) exists in the Unity Catalog volume. The prefix must be a part of the file name. Parameters:
Returns: A boolean indicating whether the prefix exists or not. |
objectExists |
---|
Determines if a specific object (file) exists in the Unity Catalog volume. The object must match the file name exactly. Parameters:
Returns: A boolean indicating whether the object exists or not. |
volumeExists |
---|
Determines if a specific volume exists in the given catalog and schema. The volume must match the volume name exactly. Parameters:
Returns: A boolean indicating whether the volume exists or not. |
listObjects |
---|
Returns the list of all filenames in the Unity Catalog volume that start with a specified prefix. The prefix must be a part of the file path from the volume as the root. Parameters:
Returns: A list of strings indicating the filenames that start with the specified prefix. |
getObject (file) |
---|
Retrieves an object (file) from the Unity Catalog volume and stores it in the specified local path. Parameters:
Returns: A boolean value indicating status of the GET operation. |
getObject (stream) |
---|
Retrieves an object as an input stream from the Unity Catalog volume. Parameters:
Returns: An instance of the input stream entity. |
putObject (file) |
---|
Uploads data from a local path to a specified path within a Unity Catalog volume. Parameters:
Returns: A boolean value indicating the status of the PUT operation. |
putObject (stream) |
---|
Uploads data from an input stream to a specified path within a Unity Catalog volume. Parameters:
Returns: A boolean value indicating status of the PUT operation. |
deleteObject |
---|
Removes an object from a specified path within a Unity Catalog volume. Parameters:
Returns: A boolean value indicating the status of the DELETE operation. |