Configure AWS storage

Databricks stores your account-wide assets, such as libraries, in an Amazon Web Services S3 bucket. This article walks you through the steps to configure your bucket to complete Databricks deployment.

Important

You can configure AWS storage settings using the Account Console only when you initially set up your account. To change settings afterwards, contact help@databricks.com.

Step 1: Generate S3 bucket policy

  1. In the Databricks Account Console, click the AWS Storage tab.

    AWS storage tab
  2. In the S3 bucket in <region> field, enter the name of your S3 bucket. For help with creating an S3 bucket, see Create a Bucket in the AWS documentation.

    Important

    • The S3 bucket must be in the same region as the Databricks deployment.
    • Databricks recommends as a best practice that you use a Databricks-specific S3 bucket.
  3. Click Generate Policy.

  4. Copy the generated policy. It should be of the following form, where 414351767826 is the Databricks account ID and <s3-bucket-name> is the S3 bucket that you specified in the first screen:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "Grant Databricks Access",
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::414351767826:root"
          },
          "Action": [
            "s3:GetObject",
            "s3:GetObjectVersion",
            "s3:PutObject",
            "s3:DeleteObject",
            "s3:ListBucket",
            "s3:GetBucketLocation"
          ],
          "Resource": [
            "arn:aws:s3:::<s3-bucket-name>/*",
            "arn:aws:s3:::<s3-bucket-name>"
          ]
        }
      ]
    }
    

Step 2: Configure the S3 bucket

To configure the S3 bucket, you apply the bucket policy generated in the Databricks Account Console and optionally set up bucket versioning.

Warning

Databricks strongly recommends that you enable bucket versioning. Versioning allows you to restore earlier versions of files in the bucket if files are accidentally modified or deleted.

  1. In the AWS Console, go to the S3 service.
  2. Click the bucket name.

Step 2a: Apply bucket policy

  1. Click the Permissions tab.

  2. Click the Bucket Policy button.

    Bucket policy button
  3. Paste the policy that you copied in Step 1 and click Save.

Step 2b: Enable bucket versioning

For information on versioning, see Using Versioning in the AWS documentation.

  1. Click the Properties tab.
  2. Click the Versioning tile.
  3. Click Enable versioning and click Save.

Warning

Versioning can impede file listing performance. To maintain acceptable performance, we recommend that you configure a lifecycle policy that ensures that old versions of files are eventually purged. Follow the instructions in How Do I Create a Lifecycle Policy for an S3 Bucket?.

Step 3: Apply the change to your Databricks account

  1. In the Databricks Account Console, go to the AWS Storage tab.
  2. Click Apply Change.

Resolve validation failures

Bucket policy permissions can take a few minutes to propagate. You should retry if validation fails due to permissions.