Volumes

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.2 and later check marked yes Unity Catalog only

Volumes are Unity Catalog objects representing a logical volume of storage in a cloud object storage location. Volumes provide capabilities for accessing, storing, governing, and organizing files. While tables provide governance over tabular datasets, volumes add governance over non-tabular datasets. You can use volumes to store and access files in any format, including structured, semi-structured, and unstructured data.

Volumes are siblings to tables, views, and other objects organized under a schema in Unity Catalog.

A volume can be managed or external.

Managed volume

A managed volume is a Unity Catalog-governed storage volume created within the default storage location of the containing schema. Managed volumes allow the creation of governed storage for working with files without the overhead of external locations and storage credentials. You do not need to specify a location when creating a managed volume, and all file access for data in managed volumes is through paths managed by Unity Catalog.

External volume

An external volume is a Unity Catalog-governed storage volume registered against a directory within an external location.

Volume naming and reference

A volume name is an identifier that can be qualified with a catalog and schema name in SQL commands.

The path to access files in volumes uses the following format:

/Volumes/<catalog_identifier>/<schema_identifier>/<volume_identifier>/<path>/<file_name>

Note that Databricks normalizes the identifiers to lower case.

Databricks also supports an optional dbfs:/ scheme, so the following path also works:

dbfs:/Volumes/<catalog_identifier>/<schema_identifier>/<volume_identifier>/<path>/<file_name>

Examples

--- Create an external volume under the directory “my-path”
> CREATE EXTERNAL VOLUME IF NOT EXISTS myCatalog.mySchema.myExternalVolume
        COMMENT 'This is my example external volume'
        LOCATION 's3://my-bucket/my-location/my-path'
 OK

--- Set the current catalog
> USE CATALOG myCatalog;
 OK

--- Set the current schema
> USE SCHEMA mySchema;
 OK

--- Create a managed volume; it is not necessary to specify a location
> CREATE VOLUME myManagedVolume
    COMMENT 'This is my example managed volume';
 OK

--- List the files inside the volume, all names are lowercase
> LIST '/Volumes/mycatalog/myschema/myexternalvolume'
 sample.csv

> LIST 'dbfs:/Volumes/mycatalog/myschema/mymanagedvolume'
 sample.csv

--- Print the content of a csv file
> SELECT * FROM csv.`/Volumes/mycatalog/myschema/myexternalvolume/sample.csv`
 20

> SELECT * FROM csv.`dbfs:/Volumes/mycatalog/myschema/mymanagedvolume/sample.csv`
 20