Configure AWS storage

This article describes how to configure Amazon Web Services S3 buckets for two different use cases:

  • Root storage for a workspace: Root storage for workspace objects like cluster logs, notebook revisions, and job results libraries. To create a new workspace using the account console or with the Account API, you must first set up an S3 bucket to use as your workspace’s root storage.
  • Log delivery (all deployment types): Storage for delivery of logs such as billable usage or audit logs.

Databricks recommends that you review Security Best Practices for S3 for guidance around protecting the data in your bucket from unwanted access.

Step 1: Create an S3 bucket

  1. Log into your AWS Console as a user with administrator privileges and go to the S3 service.

  2. Create an S3 bucket. See Create a Bucket in the AWS documentation.

    Important

    • The S3 bucket must be in the same AWS region as the Databricks deployment.
    • Databricks recommends as a best practice that you use an S3 bucket that is dedicated to Databricks, unshared with other resources or services.

Step 2: Apply bucket policy (workspace creation only)

Note

This step is necessary only if you are setting up root storage for a new workspace that you create with the Account API. Skip this step if you are setting up storage for log delivery.

  1. In the AWS Console, go to the S3 service.

  2. Click the bucket name.

  3. Click the Permissions tab.

  4. Click the Bucket Policy button.

    Bucket policy button
  5. Copy and modify this bucket policy. Replace <s3-bucket-name> with the S3 bucket name:

    Note

    If you are creating your storage configuration using the account console, you can also generate the bucket policy directly from the Add Storage Configuration dialog. See Manage storage configurations using the account console (E2).

       {
         "Version": "2012-10-17",
         "Statement": [
           {
             "Sid": "Grant Databricks Access",
             "Effect": "Allow",
             "Principal": {
               "AWS": "arn:aws:iam::414351767826:root"
             },
             "Action": [
               "s3:GetObject",
               "s3:GetObjectVersion",
               "s3:PutObject",
               "s3:DeleteObject",
               "s3:ListBucket",
               "s3:GetBucketLocation"
             ],
             "Resource": [
               "arn:aws:s3:::<s3-bucket-name>/*",
               "arn:aws:s3:::<s3-bucket-name>"
             ]
           }
         ]
       }
    

Step 3: Set S3 object ownership (log delivery only)

Note

This step is necessary only if you are setting up storage for log delivery. Skip this step if you are setting up root storage for a new workspace.

Access to the logs depends on how you set up the S3 bucket. Databricks delivers logs to your S3 bucket with AWS’s built-in BucketOwnerFullControl Canned ACL so that account owners and designees can download the logs directly.

To support bucket ownership for newly-created objects, you must set your bucket’s S3 Object Ownership setting to the value Bucket owner preferred.

Important

If instead you set your bucket’s S3 Object Ownership setting to Object writer, new objects such as your logs remain owned by the uploading account, which is by default the IAM role that Databricks uses to access the bucket. This can make it difficult to access the logs, because you cannot access them from the AWS console or automation tools that you authenticated with as the bucket owner.

Resolve validation failures

Bucket policy permissions can take a few minutes to propagate. Retry this procedure if validation fails due to permissions.