Use COPY INTO to load data with Unity Catalog
Learn how to use COPY INTO to ingest data to Unity Catalog managed or external tables from any source and file format supported by COPY INTO. Unity Catalog adds new options for configuring secure access to raw data, allowing you to use Unity Catalog external locations or storage credentials to access data in cloud object storage.
COPY INTO a Unity Catalog table
You can target a Unity Catalog table using a three tier identifier (<catalog-name>.<database-name>.<table-name>
). You can use the USE CATALOG <catalog-name>
and USE <database-name>
commands to set the default catalog and database for your current query or notebook.
If you wish to COPY INTO an existing table, you must have USE CATALOG
and USE SCHEMA
privileges on the catalog and database (schema) containing the table and MODIFY
privileges on the table.
You can COPY INTO from any source location you can access, including cloud object storage locations configured with temporary credentials.
Load data from external locations
To load data from a Unity Catalog external location, you must have the READ FILES
privilege granted on that location. External location privileges apply to all nested directories under the specified location.
For example, if you have access to an external location defined with the URL s3://landing-bucket/raw-data
, the following commands are valid:
COPY INTO landing_table
FROM 's3://landing-bucket/raw-data'
FILEFORMAT = PARQUET;
COPY INTO json_table
FROM 's3://landing-bucket/raw-data/json'
FILEFORMAT = JSON;
Permissions on this external location do not grant any privileges on directories above or parallel to the location specified. For example, neither of the following commands are valid:
COPY INTO parent_table
FROM 's3://landing-bucket'
FILEFORMAT = PARQUET;
COPY INTO sibling_table
FROM 's3://landing-bucket/json-data'
FILEFORMAT = JSON;
Specifying storage credentials to access data
Databricks recommends using external locations to explicitly grant permissions on directories used in data ingestion. COPY INTO also supports using Unity Catalog storage credentials to access data. The READ FILES
privilege will grant you access to ingest data with COPY INTO from all cloud object storage locations accessible with the storage credential. Because storage credentials may have broader privileges than external locations, use caution when granting access to storage credentials.
The following example loads data from S3 and ADLS Gen2 using Unity Catalog storage credentials to provide access to the source data.
COPY INTO my_json_data
FROM 's3://landing-bucket/json-data' WITH (
CREDENTIAL `bucket-creds`
)
FILEFORMAT = JSON;
COPY INTO my_json_data
FROM 'abfss://container@storageAccount.dfs.core.windows.net/jsonData' WITH (
CREDENTIAL `adls-creds`
)
FILEFORMAT = JSON;