Configure data access for ingestion

This article describes how admin users can configure access to data in a bucket in Amazon S3 (S3) so that Databricks users can load data from S3 into a table in Databricks.

This article describes the following ways to configure secure access to source data:

(Recommended) Create a Unity Catalog volume.
Create a Unity Catalog external location with a storage credential.

Launch a compute resource that uses an AWS instance profile.
Generate temporary credentials (an AWS access key ID, a secret key, and a session token).

Before you begin

Before you configure access to data in S3, make sure you have the following:

Data in an S3 bucket in your AWS account. To create a bucket, see Creating a bucket in the AWS documentation.

To access data using a Unity Catalog volume (recommended), the READ VOLUME privilege on the volume. For more information, see What are Unity Catalog volumes? and Unity Catalog privileges and securable objects.
To access data using a Unity Catalog external location, the READ FILES privilege on the external location. For more information, see Create an external location to connect cloud storage to Databricks.

To access data using a compute resource with an AWS instance profile, Databricks workspace admin permissions.

A Databricks SQL warehouse. To create a SQL warehouse, see Create a SQL warehouse.
Familiarity with the Databricks SQL user interface.

Configure access to cloud storage

Use one of the following methods to configure access to S3:

(Recommended) Create a Unity Catalog volume. For more information, see What are Unity Catalog volumes?.
Configure a Unity Catalog external location with a storage credential. For more information about external locations, see Create an external location to connect cloud storage to Databricks.

Configure a compute resource to use an AWS instance profile. For more information, see Configure serverless compute and SQL warehouses to use an instance profile.
Generate temporary credentials (an AWS access key ID, a secret key, and a session token) to share with other Databricks users. For more information, see Generate temporary credentials for ingestion.

Clean up

You can clean up the associated resources in your cloud account and Databricks if you no longer want to keep them.

Delete the AWS CLI named profile

In your ~/.aws/credentials file for Unix, Linux, and macOS, or in your %USERPROFILE%\.aws\credentials file for Windows, remove the following portion of the file, and then save the file:

[<named-profile>]
aws_access_key_id = <access-key-id>
aws_secret_access_key = <secret-access-key>

Delete the IAM user

Open the IAM console in your AWS account, typically at https://console.aws.amazon.com/iam.
In the sidebar, click Users.
Select the box next to the user, and then click Delete.
Enter the name of the user, and then click Delete.

Delete the IAM policy

Open the IAM console in your AWS account, if it is not already open, typically at https://console.aws.amazon.com/iam.
In the sidebar, click Policies.
Select the option next to the policy, and then click Actions > Delete.
Enter the name of the policy, and then click Delete.

Delete the S3 bucket

Open the Amazon S3 console in your AWS account, typically at https://console.aws.amazon.com/s3.
Select the option next to the bucket, and then click Empty.
Enter permanently delete, and then click Empty.
In the sidebar, click Buckets.
Select the option next to the bucket, and then click Delete.
Enter the name of the bucket, and then click Delete bucket.

Stop the SQL warehouse

If you are not using the SQL warehouse for any other tasks, you should stop the SQL warehouse to avoid additional costs.

In the SQL persona, on the sidebar, click SQL Warehouses.
Next to the name of the SQL warehouse, click Stop.
When prompted, click Stop again.

Next steps

After you complete the steps in this article, users can run the COPY INTO command to load the data from the S3 bucket into your Databricks workspace.

To load data using a Unity Catalog volume or external location, see Load data using COPY INTO with Unity Catalog volumes or external locations.

To load data using a SQL warehouse with an AWS instance profile, see Load data using COPY INTO with an instance profile.
To load data using temporary credentials (an AWS access key ID, a secret key, and a session token), see Load data using COPY INTO with temporary credentials.

Before you begin​

Configure access to cloud storage​

Clean up​

Delete the AWS CLI named profile​

Delete the IAM user​

Delete the IAM policy​

Delete the S3 bucket​

Stop the SQL warehouse​

Next steps​