Configure AWS storage (Multi-workspace API)

Preview

This article describes how to configure storage for workspaces created using the Multi-workspace API and for billable usage log delivery, both of which are in Public Preview.

This article describes how to configure Amazon Web Services S3 buckets as:

  • Root storage for workspace objects like cluster logs, notebook revisions, and job results libraries.
  • Storage for delivery of billable usage logs.

Note

If you are configuring root storage in AWS as part of the process of deploying a Databricks workspace using the Account Console, see instead Configure AWS storage (Account Console).

Step 1: Create an S3 bucket

  1. Log into your AWS Console as a user with administrator privileges and go to the S3 service.

  2. Create an S3 bucket. See Create a Bucket in the AWS documentation.

    Important

    • The S3 bucket must be in the same AWS region as the Databricks deployment.
    • Databricks recommends as a best practice that you use an S3 bucket that is dedicated to Databricks, unshared with other resources or services.

Step 2: Apply bucket policy (workspace creation only)

Note

This step is necessary only if you are setting up root storage for a new workspace that you create with the Multi-workspace API. Skip this step if you are setting up storage for billable usage log delivery.

  1. In the AWS Console, go to the S3 service.

  2. Click the bucket name.

  3. Click the Permissions tab.

  4. Click the Bucket Policy button.

    Bucket policy button
  5. Copy and modify this bucket policy. Replace <s3-bucket-name> with the S3 bucket name:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Sid": "Grant Databricks Access",
          "Effect": "Allow",
          "Principal": {
            "AWS": "arn:aws:iam::414351767826:root"
          },
          "Action": [
            "s3:GetObject",
            "s3:GetObjectVersion",
            "s3:PutObject",
            "s3:DeleteObject",
            "s3:ListBucket",
            "s3:GetBucketLocation"
          ],
          "Resource": [
            "arn:aws:s3:::<s3-bucket-name>/*",
            "arn:aws:s3:::<s3-bucket-name>"
          ]
        }
      ]
    }
    

Step 3: Enable bucket versioning

Important

Databricks strongly recommends that you enable bucket versioning. Versioning allows you to restore earlier versions of files in the bucket if files are accidentally modified or deleted.

For information on versioning, see Using Versioning in the AWS documentation.

  1. In the AWS Console, go to the S3 service.
  2. Click the bucket name.
  3. Click the Properties tab.
  4. Click the Versioning tile.
  5. Click Enable versioning and click Save.

Warning

Versioning can impede file listing performance. To maintain acceptable performance, we recommend that you configure a lifecycle policy that ensures that old versions of files are eventually purged. Follow the instructions in How Do I Create a Lifecycle Policy for an S3 Bucket?.

Resolve validation failures

Bucket policy permissions can take a few minutes to propagate. Retry this procedure if validation fails due to permissions.