Configure data access for ingestion

This article describes how admin users can configure access to data in a bucket in Amazon S3 (S3) so that Databricks users can load data from S3 into a table in Databricks.

This article describes the following ways to configure secure access to source data:

  • (Recommended) Create a Unity Catalog volume.

  • Create a Unity Catalog external location with a storage credential.

  • Launch a compute resource that uses an AWS instance profile.

  • Generate temporary credentials (an AWS access key ID, a secret key, and a session token).

Before you begin

Before you configure access to data in S3, make sure you have the following:

  • Data in an S3 bucket in your AWS account. To create a bucket, see Creating a bucket in the AWS documentation.

  • To access data using a compute resource with an AWS instance profile, Databricks workspace admin permissions.

Configure access to cloud storage

Use one of the following methods to configure access to S3:

Clean up

You can clean up the associated resources in your cloud account and Databricks if you no longer want to keep them.

Delete the AWS CLI named profile

In your ~/.aws/credentials file for Unix, Linux, and macOS, or in your %USERPROFILE%\.aws\credentials file for Windows, remove the following portion of the file, and then save the file:

[<named-profile>]
aws_access_key_id = <access-key-id>
aws_secret_access_key = <secret-access-key>

Delete the IAM user

  1. Open the IAM console in your AWS account, typically at https://console.aws.amazon.com/iam.

  2. In the sidebar, click Users.

  3. Select the box next to the user, and then click Delete.

  4. Enter the name of the user, and then click Delete.

Delete the IAM policy

  1. Open the IAM console in your AWS account, if it is not already open, typically at https://console.aws.amazon.com/iam.

  2. In the sidebar, click Policies.

  3. Select the option next to the policy, and then click Actions > Delete.

  4. Enter the name of the policy, and then click Delete.

Delete the S3 bucket

  1. Open the Amazon S3 console in your AWS account, typically at https://console.aws.amazon.com/s3.

  2. Select the option next to the bucket, and then click Empty.

  3. Enter permanently delete, and then click Empty.

  4. In the sidebar, click Buckets.

  5. Select the option next to the bucket, and then click Delete.

  6. Enter the name of the bucket, and then click Delete bucket.

Stop the SQL warehouse

If you are not using the SQL warehouse for any other tasks, you should stop the SQL warehouse to avoid additional costs.

  1. In the SQL persona, on the sidebar, click SQL Warehouses.

  2. Next to the name of the SQL warehouse, click Stop.

  3. When prompted, click Stop again.

Next steps

After you complete the steps in this article, users can run the COPY INTO command to load the data from the S3 bucket into your Databricks workspace.