AWS Storage

Databricks stores your account-wide assets, such as libraries, in an Amazon Web Services S3 bucket. This topic walks you through the steps to configure your bucket to complete Databricks deployment.

Important

You can configure AWS storage settings using the Admin Console only when you initially set up your account. To change settings afterwards, contact support@databricks.com.

Step 1: Generate S3 bucket policy

  1. In the Databricks Account Console, click the AWS Storage tab.

    ../../_images/aws-storage.png
  2. In the S3 bucket in <region> field, enter the name of your S3 bucket. For help with creating an S3 bucket, see Create a Bucket in the AWS documentation.

    Important

    • The S3 bucket must be in the same region as the Databricks deployment.
    • Databricks recommends as a best practice that you use a Databricks-specific S3 bucket.
  3. Click Generate Policy.

  4. Copy the generated policy. It should be of the following form, where 414351767826 is the Databricks account ID and <s3-bucket-name> is the S3 bucket that you specified in the first screen:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "Grant Databricks Access",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::414351767826:root"
      },
      "Action": [
        "s3:GetObject",
        "s3:GetObjectVersion",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:ListBucket",
        "s3:GetBucketLocation"
      ],
      "Resource": [
        "arn:aws:s3:::<s3-bucket-name>/*",
        "arn:aws:s3:::<s3-bucket-name>"
      ]
    }
  ]
}

Step 2: Configure the S3 bucket

To configure the S3 bucket, you apply the bucket policy generated in the Databricks Account Console and optionally set up bucket versioning.

Warning

Databricks strongly recommends that you enable bucket versioning, which allows you to restore earlier versions of files in the bucket. If your S3 bucket becomes corrupted it can prevent Databricks from functioning, and the easiest remedy is to roll back to an earlier version.

  1. In the AWS Console, go to the S3 service.
  2. Click the bucket name.

Step 2a: Apply bucket policy

  1. Click the Permissions tab.

  2. Click the Bucket Policy button.

    ../../_images/bucket-policy.png
  3. Paste the policy that you copied in Step 1 and click Save.

Step 2b: Enable bucket versioning

For information on versioning, see Using Versioning in the AWS documentation.

  1. Click the Properties tab.
  2. Click the Versioning tile.
  3. Click Enable versioning and click Save.

After versioning is enabled, you can also optionally configure lifecycle rules by following the instructions in How Do I Create a Lifecycle Policy for an S3 Bucket? in the AWS documentation. Your lifecycle policy should ensure that old versions of deleted files are eventually purged.

Step 3: Apply the change to your Databricks account

  1. In the Databricks Account Console, go to the AWS Storage tab.
  2. Click Apply Change.

Resolve validation failures

Bucket policy permissions can take a few minutes to propagate. You should retry if validation fails due to permissions.