Load data using COPY INTO with Unity Catalog volumes or external locations

Learn how to use COPY INTO to ingest data to Unity Catalog managed or external tables from any source and file format supported by COPY INTO. Unity Catalog adds new options for configuring secure access to raw data. You can use Unity Catalog volumes or external locations to access data in cloud object storage.

Databricks recommends using volumes to access files in cloud storage as part of the ingestion process using COPY INTO. For more information about recommendations for using volumes and external locations, see Unity Catalog best practices.

This article describes how to use the COPY INTO command to load data from an Amazon S3 (S3) bucket in your AWS account into a table in Databricks SQL.

The steps in this article assume that your admin has configured a Unity Catalog volume or external location so that you can access your source files in S3. If your admin configured a compute resource to use an AWS instance profile, see Load data using COPY INTO with an instance profile or Tutorial: COPY INTO with Spark SQL instead. If your admin gave you temporary credentials (an AWS access key ID, a secret key, and a session token), see Load data using COPY INTO with temporary credentials instead.

Before you begin

Before you use COPY INTO to load data from a Unity Catalog volume or from a cloud object storage path that’s defined as a Unity Catalog external location, you must have the following:

  • The READ VOLUME privilege on a volume or the READ FILES privilege on an external location.

    For more information about creating volumes, see Create and work with volumes.

    For more information about creating external locations, see Create an external location to connect cloud storage to Databricks.

  • The path to your source data in the form of a cloud object storage URL or a volume path.

    Example cloud object storage URL: s3://landing-bucket/raw-data/json.

    Example volume path: /Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data/json.

  • The USE SCHEMA privilege on the schema that contains the target table.

  • The USE CATALOG privilege on the parent catalog.

For more information about Unity Catalog privileges, see Unity Catalog privileges and securable objects.

Load data from a volume

To load data from a Unity Catalog volume, you must have the READ VOLUME privilege. Volume privileges apply to all nested directories under the specified volume.

For example, if you have access to a volume with the path /Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/, the following commands are valid:

COPY INTO landing_table
FROM '/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data'
FILEFORMAT = PARQUET;

COPY INTO json_table
FROM '/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data/json'
FILEFORMAT = JSON;

Optionally, you can also use a volume path with the dbfs scheme. For example, the following commands are also valid:

COPY INTO landing_table
FROM 'dbfs:/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data'
FILEFORMAT = PARQUET;

COPY INTO json_table
FROM 'dbfs:/Volumes/quickstart_catalog/quickstart_schema/quickstart_volume/raw_data/json'
FILEFORMAT = JSON;

Load data using an external location

The following example loads data from S3 into a table using Unity Catalog external locations to provide access to the source code.

COPY INTO my_json_data
FROM 's3://landing-bucket/json-data'
FILEFORMAT = JSON;

External location privilege inheritance

External location privileges apply to all nested directories under the specified location.

For example, if you have access to an external location defined with the URL s3://landing-bucket/raw-data, the following commands are valid:

COPY INTO landing_table
FROM 's3://landing-bucket/raw-data'
FILEFORMAT = PARQUET;

COPY INTO json_table
FROM 's3://landing-bucket/raw-data/json'
FILEFORMAT = JSON;

Permissions on this external location do not grant any privileges on directories above or parallel to the location specified. For example, neither of the following commands are valid:

COPY INTO parent_table
FROM 's3://landing-bucket'
FILEFORMAT = PARQUET;

COPY INTO sibling_table
FROM 's3://landing-bucket/json-data'
FILEFORMAT = JSON;

Three-level namespace for target tables

You can target a Unity Catalog table using a three tier identifier (<catalog_name>.<database_name>.<table_name>). You can use the USE CATALOG <catalog_name> and USE <database_name> commands to set the default catalog and database for your current query or notebook.