Configure data access for ingestion
This article describes how admin users can configure access to data in a bucket in Amazon S3 (S3) so that Databricks users can load data from S3 into a table in Databricks.
This article describes the following ways to configure secure access to source data:
(Recommended) Create a Unity Catalog volume.
Create a Unity Catalog external location with a storage credential.
Launch a compute resource that uses an AWS instance profile.
Generate temporary credentials (an AWS access key ID, a secret key, and a session token).
Before you begin
Before you configure access to data in S3, make sure you have the following:
An S3 bucket in your AWS account. To create a bucket, see Creating a bucket in the AWS documentation.
To access data using a Unity Catalog volume (recommended), the
READ VOLUME
privilege on the volume. For more information, see Create volumes and Unity Catalog privileges and securable objects.To access data using a Unity Catalog external location, the
READ FILES
privilege on the external location. For more information, see Manage external locations and storage credentials.To access data using a compute resource with an AWS instance profile, Databricks workspace admin permissions.
A Databricks SQL warehouse. To create a SQL warehouse, see Configure SQL warehouses.
Familiarity with the Databricks SQL user interface.
Step 1: Upload data to cloud storage
This step describes how to upload data into a folder in a bucket in S3.
Sign in to the AWS Management Console, then open the Amazon S3 console in your AWS account, typically at https://console.aws.amazon.com/s3.
Browse to and click on your S3 bucket.
Click Create folder.
Enter a name for the folder, then click Create folder.
Click the folder.
Click Upload.
Follow the on-screen instructions to upload data into this folder.
Step 2: Configure access to cloud storage
Use one of the following methods to configure access to S3:
(Recommended) Create a Unity Catalog volume. For more information, see Create volumes.
Configure a Unity Catalog external location with a storage credential. For more information about external locations, see Manage external locations and storage credentials.
Configure a compute resource to use an AWS instance profile. For more information, see Configure a SQL warehouse to use an instance profile.
..azure-aws:
- Generate temporary credentials (<temporary credentials>) to share with other <Databricks> users. For more information, see [_](generate-temporary-credentials.md).
Clean up
You can clean up the associated resources in your cloud account and Databricks if you no longer want to keep them.
Delete the AWS CLI named profile
In your ~/.aws/credentials
file for Unix, Linux, and macOS, or in your %USERPROFILE%\.aws\credentials
file for Windows, remove the following portion of the file, and then save the file:
[<named-profile>]
aws_access_key_id = <access-key-id>
aws_secret_access_key = <secret-access-key>
Delete the IAM user
Open the IAM console in your AWS account, typically at https://console.aws.amazon.com/iam.
In the sidebar, click Users.
Select the box next to the user, and then click Delete.
Enter the name of the user, and then click Delete.
Delete the IAM policy
Open the IAM console in your AWS account, if it is not already open, typically at https://console.aws.amazon.com/iam.
In the sidebar, click Policies.
Select the option next to the policy, and then click Actions > Delete.
Enter the name of the policy, and then click Delete.
Delete the S3 bucket
Open the Amazon S3 console in your AWS account, typically at https://console.aws.amazon.com/s3.
Select the option next to the bucket, and then click Empty.
Enter
permanently delete
, and then click Empty.In the sidebar, click Buckets.
Select the option next to the bucket, and then click Delete.
Enter the name of the bucket, and then click Delete bucket.
Next steps
After you complete the steps in this article, users can run the COPY INTO
command to load the data from the S3 bucket into your Databricks workspace.
To load data using a Unity Catalog external location, see Load data using COPY INTO with Unity Catalog volumes or external locations.
To load data using a SQL warehouse with an AWS instance profile, see Load data using COPY INTO with an instance profile.
To load data using temporary credentials (an AWS access key ID, a secret key, and a session token), see Load data using COPY INTO with temporary credentials.