Create an S3 bucket for workspace deployment

This article describes how to create and configure root storage for a custom Databricks workspace deployment. You can also automate this step and the entire workspace creation by using the AWS Quick Start template or the Databricks Terraform provider to deploy your workspace.

Requirements

Best practices for root storage creation

The bucket you use for workspace deployment is referred to as your workspace’s root storage. Do not use your root storage to store production customer data. Instead, create additional S3 buckets or other data sources for production data and optionally create DBFS mount points for them.

Additionally, before you create your S3 bucket, review the following best practices:

  • The S3 bucket must be in the same AWS region as the Databricks workspace deployment.

  • Databricks recommends that you use an S3 bucket that is dedicated to Databricks, unshared with other resources or services.

  • Do not reuse a bucket from legacy Databricks workspaces. For example, if you are migrating to E2, create a new AWS bucket for your E2 setup.

Step 1: Create a storage configuration and generate a bucket policy

  1. In to the account console, click Cloud resources.

  2. Click Storage configuration.

  3. Click Add storage configuration.

  4. In the Storage configuration name field, enter a human-readable name for your new storage configuration.

  5. In the Bucket Name field, enter the name of the S3 bucket that you will create.

    Important

    The bucket name cannot include dot notation (.). It must be globally unique and cannot include spaces or uppercase letters. For more bucket naming guidance, see the AWS bucket naming rules.

  6. Click Generate Policy and copy the policy that is generated. You add this policy to your S3 bucket configuration in AWS in the next step.

  7. Click Add.

Step 2: Create the S3 bucket

  1. Log into your AWS Console as a user with administrator privileges and go to the S3 service.

  2. Click the Create bucket button.

  3. In Bucket name, enter the name for the bucket that you created in Step 1.

  4. Select the same AWS region that you will use for your Databricks workspace deployment.

  5. Click Create bucket.

  6. Click the Permissions tab.

  7. In the Bucket policy section, click Edit.

  8. Paste the bucket policy that you generated and copied from Databricks.

  9. Save the bucket.

Resolve validation failures

Bucket policy permissions can take a few minutes to propagate. Retry this procedure if validation fails due to permissions.

Verify correct permissions

When creating a storage configuration for your bucket, Databricks checks whether your bucket has been set up with correct permissions. One of these checks writes a file in your bucket and immediately deletes it. However, if the delete operation fails, the temporary object remains at the root of your bucket. The object name begins with databricks-verification-<uuid>.

If you see this object, it is likely because of a misconfiguration in the bucket policy. Databricks has PUT permissions but not DELETE permissions. Review the bucket policy and verify that the permissions are configured correctly.

Delete a storage configuration

Storage configurations cannot be edited after creation. If the configuration has incorrect data or if you no longer need it, delete the storage configuration:

  1. In the account console, click Cloud resources.

  2. Click Storage configuration.

  3. On the storage configuration row, click the Actions menu icon, and select Delete.

    You can also click the storage configuration name and click Delete on the pop-up dialog.

  4. In the confirmation dialog, click Confirm Delete.

Encrypt your root S3 bucket using customer-managed keys (optional)

You can encrypt your root S3 bucket using customer-managed keys, which requires using the Account API.

You can either add an encryption key when you create a new workspace using the Account API or add the key later. For more information, see Step 5: Configure customer-managed keys (optional) and Customer-managed keys for encryption.