This article describes how to create and configure root storage for a custom Databricks workspace deployment. You can also automate this step and the entire workspace creation by using the AWS Quick Start template or the Databricks Terraform provider to deploy your workspace.
You must be a Databricks account admin.
The bucket you use for workspace deployment is referred to as your workspace’s root storage. Do not use your root storage to store production customer data. Instead, create additional S3 buckets or other data sources for production data and optionally create DBFS mount points for them.
Additionally, before you create your S3 bucket, review the following best practices:
The S3 bucket must be in the same AWS region as the Databricks workspace deployment.
Databricks recommends that you use an S3 bucket that is dedicated to Databricks, unshared with other resources or services.
Do not reuse a bucket from legacy Databricks workspaces. For example, if you are migrating to E2, create a new AWS bucket for your E2 setup.
In to the account console, click Cloud resources.
Click Storage configuration.
Click Add storage configuration.
In the Storage configuration name field, enter a human-readable name for your new storage configuration.
In the Bucket Name field, enter the name of the S3 bucket that you will create.
The bucket name cannot include dot notation (
.). It must be globally unique and cannot include spaces or uppercase letters. For more bucket naming guidance, see the AWS bucket naming rules.
Click Generate Policy and copy the policy that is generated. You add this policy to your S3 bucket configuration in AWS in the next step.
Log into your AWS Console as a user with administrator privileges and go to the S3 service.
Click the Create bucket button.
In Bucket name, enter the name for the bucket that you created in Step 1.
Select the same AWS region that you will use for your Databricks workspace deployment.
Click Create bucket.
Click the Permissions tab.
In the Bucket policy section, click Edit.
Paste the bucket policy that you generated and copied from Databricks.
Save the bucket.
Databricks strongly recommends that you enable S3 object-level logging for your root storage bucket. This enables faster investigation of any issues that may come up. Be aware that S3 object-level logging can increase AWS usage costs.
For instructions, see the AWS documentation on CloudTrail event logging for S3 buckets and objects.
Bucket policy permissions can take a few minutes to propagate. Retry this procedure if validation fails due to permissions.
When creating a storage configuration for your bucket, Databricks checks whether your bucket has been set up with correct permissions. One of these checks writes a file in your bucket and immediately deletes it. However, if the delete operation fails, the temporary object remains at the root of your bucket. The object name begins with
If you see this object, it is likely because of a misconfiguration in the bucket policy. Databricks has PUT permissions but not DELETE permissions. Review the bucket policy and verify that the permissions are configured correctly.
Storage configurations cannot be edited after creation. If the configuration has incorrect data or if you no longer need it, delete the storage configuration:
In the account console, click Cloud resources.
Click Storage configuration.
On the storage configuration row, click the Actions menu icon, and select Delete.
You can also click the storage configuration name and click Delete on the pop-up dialog.
In the confirmation dialog, click Confirm Delete.
You can encrypt your root S3 bucket using customer-managed keys, which requires using the Account API.