You can use COPY INTO to ingest data to Unity Catalog managed or external tables from any source and file format supported by COPY INTO. Unity Catalog adds new options for configuring secure access to raw data, allowing you to use Unity Catalog external locations or storage credentials to access data in cloud object storage.
You can target a Unity Catalog table using a three tier identifier (
<catalog_name>.<database_name>.<table_name>). You can use the
USE CATALOG <catalog_name> and
USE <database_name> commands to set the default catalog and database for your current query or notebook.
If you wish to COPY INTO an existing table, you must have
USAGE privileges on the catalog and database containing the table and
MODIFY privileges on the table.
You can COPY INTO from any source location you can access, including cloud object storage locations configured with temporary credentials.
To load data from a Unity Catalog external location, you must have the
READ FILES privilege granted on that location. External location privileges apply to all nested directories under the specified location.
For example, if you have access to an external location defined with the URL
s3://landing-bucket/raw-data, the following commands are valid:
COPY INTO landing_table FROM 's3://landing-bucket/raw-data' FILEFORMAT = PARQUET; COPY INTO json_table FROM 's3://landing-bucket/raw-data/json' FILEFORMAT = JSON;
Permissions on this external location do not grant any privileges on directories above or parallel to the location specified. For example, neither of the following commands are valid:
COPY INTO parent_table FROM 's3://landing-bucket' FILEFORMAT = PARQUET; COPY INTO sibling_table FROM 's3://landing-bucket/json-data' FILEFORMAT = JSON;
Databricks recommends using external locations to explicitly grant permissions on directories used in data ingestion. COPY INTO also supports using Unity Catalog storage credentials to access data. The
READ FILES privilege will grant you access to ingest data with COPY INTO from all cloud object storage locations accessible with the storage credential. Because storage credentials may have broader privileges than external locations, use caution when granting access to storage credentials.
The following example loads data from S3 and ADLS Gen2 using Unity Catalog storage credentials to provide access to the source data.
COPY INTO my_json_data FROM 's3://landing-bucket/json-data' WITH ( CREDENTIAL `bucket-creds` ) FILEFORMAT = JSON; COPY INTO my_json_data FROM 'abfss://container@storageAccount.dfs.core.windows.net/jsonData' WITH ( CREDENTIAL `adls-creds` ) FILEFORMAT = JSON;