Load data using COPY INTO with Unity Catalog volumes or external locations

This article describes how to use the COPY INTO command to load data from an Amazon S3 (S3) bucket in your AWS account into a table in Databricks SQL.

The steps in this article assume that your admin has configured a Unity Catalog volume or external location so that you can access your source files in S3. If your admin configured a compute resource to use an AWS instance profile, see Load data using COPY INTO with an instance profile or Tutorial: COPY INTO with Spark SQL instead. If your admin gave you temporary credentials (an AWS access key ID, a secret key, and a session token), see Load data using COPY INTO with temporary credentials instead.

Databricks recommends using volumes to access files in cloud storage as part of the ingestion process using COPY INTO. For more information about recommendations for using volumes and external locations, see Unity Catalog best practices.

Before you begin

Before you use COPY INTO to load data from a Unity Catalog volume or from a cloud object storage path that's defined as a Unity Catalog external location, you must have the following:

The READ VOLUME privilege on a volume or the READ FILES privilege on an external location. For more information about creating volumes, see What are Unity Catalog volumes?. For more information about creating external locations, see Overview of external locations.
The path to your source data in the form of a cloud object storage URL or a volume path.

Example cloud object storage URL: s3://landing-bucket/raw-data/json.

Example volume path: /Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data/json.
The USE SCHEMA privilege on the schema that contains the target table.
The USE CATALOG privilege on the parent catalog.

For more information about Unity Catalog privileges, see Unity Catalog privileges reference.

Load data from a volume

To load data from a Unity Catalog volume, you must have the READ VOLUME privilege. Volume privileges apply to all nested directories under the specified volume.

For example, if you have access to a volume with the path /Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/, the following commands are valid:

SQL
COPY INTO landing_table
FROM '/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data'
FILEFORMAT = PARQUET;

COPY INTO json_table
FROM '/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data/json'
FILEFORMAT = JSON;

The dbfs: scheme prefix is also supported with volume paths (for example, dbfs:/Volumes/...), but Databricks recommends using the /Volumes/ path format shown above.

Load data using an external location

note

Databricks recommends using volumes to access files in cloud storage. External locations are supported but are not the recommended approach. See Managed and external tables.

The following example loads data from S3 into a table using Unity Catalog external locations to provide access to the source data.

SQL
COPY INTO my_json_data
FROM 's3://landing-bucket/json-data'
FILEFORMAT = JSON;

External location privilege inheritance

External location privileges apply to all nested directories under the specified location.

For example, if you have access to an external location defined with the URL s3://landing-bucket/raw-data, the following commands are valid:

SQL
COPY INTO landing_table
FROM 's3://landing-bucket/raw-data'
FILEFORMAT = PARQUET;

COPY INTO json_table
FROM 's3://landing-bucket/raw-data/json'
FILEFORMAT = JSON;

Permissions on this external location do not grant any privileges on directories above or parallel to the location specified. For example, neither of the following commands are valid:

SQL
COPY INTO parent_table
FROM 's3://landing-bucket'
FILEFORMAT = PARQUET;

COPY INTO sibling_table
FROM 's3://landing-bucket/json-data'
FILEFORMAT = JSON;

Three-level namespace for target tables

You can target a Unity Catalog table using a three-level namespace (<catalog_name>.<schema_name>.<table_name>). Use the USE CATALOG <catalog_name> and USE SCHEMA <schema_name> commands to set the default catalog and schema for your current query or notebook.

The following example loads data from a volume into a table using the three-level namespace:

SQL
COPY INTO quickstart_catalog.quickstart_schema.landing_table
FROM '/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data'
FILEFORMAT = PARQUET;

You can also set the default catalog and schema first:

SQL
USE CATALOG quickstart_catalog;
USE SCHEMA quickstart_schema;

COPY INTO landing_table
FROM '/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data'
FILEFORMAT = PARQUET;

Before you begin​

Load data from a volume​

Load data using an external location​

External location privilege inheritance​

Three-level namespace for target tables​

Before you begin

Load data from a volume

Load data using an external location

External location privilege inheritance

Three-level namespace for target tables