Load data using COPY INTO with Unity Catalog volumes or external locations
Learn how to use COPY INTO
to ingest data to Unity Catalog managed or external tables from any source and file format supported by COPY INTO. Unity Catalog adds new options for configuring secure access to raw data. You can use Unity Catalog volumes or external locations to access data in cloud object storage.
Databricks recommends using volumes to access files in cloud storage as part of the ingestion process using COPY INTO
. For more information about recommendations for using volumes and external locations, see Unity Catalog best practices.
This article describes how to use the COPY INTO
command to load data from an Amazon S3 (S3) bucket in your AWS account into a table in Databricks SQL.
The steps in this article assume that your admin has configured a Unity Catalog volume or external location so that you can access your source files in S3. If your admin configured a compute resource to use an AWS instance profile, see Load data using COPY INTO with an instance profile or Tutorial: COPY INTO with Spark SQL instead. If your admin gave you temporary credentials (an AWS access key ID, a secret key, and a session token), see Load data using COPY INTO with temporary credentials instead.
Before you begin
Before you use COPY INTO
to load data from a Unity Catalog volume or from a cloud object storage path that’s defined as a Unity Catalog external location, you must have the following:
The
READ VOLUME
privilege on a volume or theREAD FILES
privilege on an external location.For more information about creating volumes, see What are Unity Catalog volumes?.
For more information about creating external locations, see Create an external location to connect cloud storage to Databricks.
The path to your source data in the form of a cloud object storage URL or a volume path.
Example cloud object storage URL:
s3://landing-bucket/raw-data/json
.Example volume path:
/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data/json
.The
USE SCHEMA
privilege on the schema that contains the target table.The
USE CATALOG
privilege on the parent catalog.
For more information about Unity Catalog privileges, see Unity Catalog privileges and securable objects.
Load data from a volume
To load data from a Unity Catalog volume, you must have the READ VOLUME
privilege. Volume privileges apply to all nested directories under the specified volume.
For example, if you have access to a volume with the path /Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/
, the following commands are valid:
COPY INTO landing_table
FROM '/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data'
FILEFORMAT = PARQUET;
COPY INTO json_table
FROM '/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data/json'
FILEFORMAT = JSON;
Optionally, you can also use a volume path with the dbfs scheme. For example, the following commands are also valid:
COPY INTO landing_table
FROM 'dbfs:/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data'
FILEFORMAT = PARQUET;
COPY INTO json_table
FROM 'dbfs:/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data/json'
FILEFORMAT = JSON;
Load data using an external location
The following example loads data from S3 into a table using Unity Catalog external locations to provide access to the source code.
COPY INTO my_json_data
FROM 's3://landing-bucket/json-data'
FILEFORMAT = JSON;
External location privilege inheritance
External location privileges apply to all nested directories under the specified location.
For example, if you have access to an external location defined with the URL s3://landing-bucket/raw-data
, the following commands are valid:
COPY INTO landing_table
FROM 's3://landing-bucket/raw-data'
FILEFORMAT = PARQUET;
COPY INTO json_table
FROM 's3://landing-bucket/raw-data/json'
FILEFORMAT = JSON;
Permissions on this external location do not grant any privileges on directories above or parallel to the location specified. For example, neither of the following commands are valid:
COPY INTO parent_table
FROM 's3://landing-bucket'
FILEFORMAT = PARQUET;
COPY INTO sibling_table
FROM 's3://landing-bucket/json-data'
FILEFORMAT = JSON;
Three-level namespace for target tables
You can target a Unity Catalog table using a three tier identifier (<catalog_name>.<database_name>.<table_name>
). You can use the USE CATALOG <catalog_name>
and USE <database_name>
commands to set the default catalog and database for your current query or notebook.