Path rules and access in Unity Catalog volumes
This article explains restrictions around path overlaps in Unity Catalog, details path-based access patterns for data files in Unity Catalog objects, and describes how Unity Catalog manages paths for tables and volumes.
Volumes are only supported on Databricks Runtime 13.3 LTS and above. In Databricks Runtime 12.2 LTS and below, operations against /Volumes
paths might succeed, but they can only write data to ephemeral storage disks attached to compute clusters rather than persisting data to Unity Catalog volumes as expected.
Path overlap restrictions in Unity Catalog
Unity Catalog enforces data governance by preventing managed directories of data from overlapping. Unity Catalog enforces the following rules:
- External locations can't overlap other external locations.
- Tables and volumes store data files in external locations or the metastore root location.
- Volumes can't overlap other volumes.
- Tables can't overlap other tables.
- Tables and volumes can't overlap each other.
- Managed storage locations can't overlap each other. See Specify a managed storage location in Unity Catalog.
- External volumes can't overlap managed storage locations.
- External tables can't overlap managed storage locations.
These rules mean that the following restrictions exist in Unity Catalog:
- You can't define an external location within another external location.
- You can't define a volume within another volume.
- You can't define a table within another table.
- You can't define a table on any data files or directories within a volume.
- You can't define a volume on a directory within a table.
You can always use path-based access to write or read data files from volumes, including Delta Lake. You can't register these data files as tables in the Unity Catalog metastore.
Fully managed paths for tables and volumes
When you create a managed table or a managed volume, Unity Catalog creates a new directory in the Unity Catalog-configured storage location associated with the containing schema. The name of this directory is randomly generated to avoid any potential collision with other directories already present.
This behavior differs from how Hive metastore creates managed tables. Databricks recommends always interacting with Unity Catalog managed tables using table names and Unity Catalog managed volumes using volume paths.
External location paths for tables and volumes
When you create an external table or volume, you specify a path within an external location governed by Unity Catalog.
To avoid path conflicts, Databricks recommends creating external tables and volumes in sub-directories rather than at the root of an external location.
For ease of use, interact with Unity Catalog external tables using table names, and external volumes using volume paths. Users with sufficient privileges can also access data directly using the full cloud storage path.
Access to data through cloud URIs for these objects is fully governed by Unity Catalog privileges, which override any privileges on the external location itself. See Path overlap restrictions in Unity Catalog and Unity Catalog privileges and securable objects.
Access data in Unity Catalog
Unity Catalog objects provide access to data through object identifiers, volume paths, or cloud URIs. You can use these values to access data associated with volumes and tables.
Unity Catalog tables are accessed using a three-tier identifier with the following pattern:
<catalog_name>.<schema_name>.<table_name>
Volume file paths in Unity Catalog
Volumes provide a file path to access data files with the following pattern:
/Volumes/<catalog_name>/<schema_name>/<volume_name>/<path_to_file>
Cloud URIs require users to provide the driver, storage container identifier, and full path to the target files, as in the following example:
s3://<bucket_name>/<path>
The following table shows the access methods allowed for Unity Catalog objects:
Object | Object identifier | File path | Cloud URI |
---|---|---|---|
External location | no | no | yes |
Managed table | yes | no | no |
External table | yes | no | yes |
Managed volume | no | yes | no |
External volume | no | yes | yes |
Unity Catalog volumes use three-tier object identifiers with the following pattern for management commands (such as CREATE VOLUME
and DROP VOLUME
):
<catalog_name>.<schema_name>.<volume_name>
To actually work with files in volumes, you must use path-based access.