Skip to main content

Create a workspace with manual AWS configurations

This page explains how to deploy a classic workspace using manually created AWS resources. Use this method if you want to create your own AWS resources or need to deploy a workspace with custom configurations such as your own VPC, specific IAM policies, or pre-existing S3 buckets.

For most deployments, Databricks recommends using automated configuration, which uses AWS IAM temporary delegation to automatically provision all required resources.

Requirements

To create a workspace with manual configuration, you must:

  • Be an account admin in your Databricks account.
  • Have permissions to provision IAM roles, S3 buckets, and access policies in your AWS account.
  • Have an available VPC and NAT gateway in your AWS account in the workspace's region. You can view your available quotas and request increases using the AWS Service Quotas console.
  • Have the STS endpoint activated for us-west-2. For details, see the AWS documentation.

Create a Databricks workspace with manual AWS configurations

To create a workspace with manually configured AWS resources:

  1. Go to the account console and click the Workspaces icon.
  2. Click Create Workspace.
  3. Under Basics, confirm the workspace name and region.
  4. In the Compute credentials dropdown, select or create the workspace's compute credentials.
  5. In the Network configuration dropdown, select or create the workspace's network configuration. This is set to Databricks-managed VPC by default. To create a new customer-managed VPC, see Configure a customer-managed VPC.
  6. In the Workspace storage dropdown, select or create a workspace storage configuration.
  7. (Optional) The Metastore setting allows you to choose the metastore for your workspace to use. By default, the metastore is automatically selected if a metastore already exists in the workspace's region. If this is the first workspace you are deploying in a region, the metastore is created automatically.
  8. (Optional) In the Private access setting dropdown, select or create a private link configuration. To enable PrivateLink, you must first create the required regional VPC endpoints and register them with your network configuration. See Private access settings.
  9. (Optional) Under Advanced, you can configure any advanced settings for your workspace. See Advanced configurations.
  10. Click Create workspace. You are automatically redirected to the workspace details page.

Create a credential configuration

The credential configuration gives Databricks access to launch compute resources in your AWS account. This step requires you to create a new cross-account IAM role with an access policy.

Step 1: Create a cross-account IAM role

  1. Get your Databricks account ID. See Locate your account ID.
  2. Log into your AWS Console as a user with administrator privileges and go to the IAM console.
  3. Click the Roles tab in the sidebar.
  4. Click Create role.
    1. In Select type of trusted entity, click the AWS account tile.
    2. Select the Another AWS account checkbox.
    3. In the Account ID field, enter the Databricks account ID 414351767826. This is not the Account ID you copied from the Databricks account console. If you are are using Databricks on AWS GovCloud use the Databricks account ID 044793339203 for AWS GovCloud or 170661010020 for AWS GovCloud DoD.
    4. Select the Require external ID checkbox.
    5. In the External ID field, enter your Databricks account ID, which you copied from the Databricks account console.
    6. Click the Next button.
    7. On the Add Permissions page, click the Next button. You should now be on the Name, review, and create page.
    8. In the Role name field, enter a role name.
    9. Click Create role. The list of roles appears.

Step 2: Create an access policy

The access policy you add to the role depends on your Amazon VPC (Virtual Private Cloud) deployment type. For information about how Databricks uses each permission, see Permissions in cross-account IAM roles.

  1. In the Roles section of the IAM console, click the IAM role you created in Step 1.
  2. Click the Add permissions drop-down and select Create inline policy.
  3. In the policy editor, click the JSON tab.
  4. Copy and paste the appropriate access policy for your deployment type:

A single VPC that Databricks creates and configures in your AWS account.

JSON
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1403287045000",
"Effect": "Allow",
"Action": [
"ec2:AllocateAddress",
"ec2:AssignPrivateIpAddresses",
"ec2:AssociateDhcpOptions",
"ec2:AssociateIamInstanceProfile",
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AttachVolume",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CancelSpotInstanceRequests",
"ec2:CreateDhcpOptions",
"ec2:CreateFleet",
"ec2:CreateInternetGateway",
"ec2:CreateLaunchTemplate",
"ec2:CreateLaunchTemplateVersion",
"ec2:CreateNatGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:CreateVpc",
"ec2:CreateVpcEndpoint",
"ec2:DeleteDhcpOptions",
"ec2:DeleteFleets",
"ec2:DeleteInternetGateway",
"ec2:DeleteLaunchTemplate",
"ec2:DeleteLaunchTemplateVersions",
"ec2:DeleteNatGateway",
"ec2:DeleteRoute",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteTags",
"ec2:DeleteVolume",
"ec2:DeleteVpc",
"ec2:DeleteVpcEndpoints",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeFleetHistory",
"ec2:DescribeFleetInstances",
"ec2:DescribeFleets",
"ec2:DescribeIamInstanceProfileAssociations",
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances",
"ec2:DescribeInternetGateways",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeNatGateways",
"ec2:DescribePrefixLists",
"ec2:DescribeReservedInstancesOfferings",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSpotInstanceRequests",
"ec2:DescribeSpotPriceHistory",
"ec2:DescribeSubnets",
"ec2:DescribeVolumes",
"ec2:DescribeVpcs",
"ec2:DetachInternetGateway",
"ec2:DisassociateIamInstanceProfile",
"ec2:DisassociateRouteTable",
"ec2:GetLaunchTemplateData",
"ec2:GetSpotPlacementScores",
"ec2:ModifyFleet",
"ec2:ModifyLaunchTemplate",
"ec2:ModifyVpcAttribute",
"ec2:ReleaseAddress",
"ec2:ReplaceIamInstanceProfileAssociation",
"ec2:RequestSpotInstances",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RunInstances",
"ec2:TerminateInstances"
],
"Resource": ["*"]
},
{
"Effect": "Allow",
"Action": ["iam:CreateServiceLinkedRole", "iam:PutRolePolicy"],
"Resource": "arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot",
"Condition": {
"StringLike": {
"iam:AWSServiceName": "spot.amazonaws.com"
}
}
}
]
}
  1. Click Review policy.
  2. In the Name field, enter a policy name.
  3. Click Create policy.
  4. (Optional) If you use Service Control Policies to deny certain actions at the AWS account level, ensure that sts:AssumeRole is allowlisted so Databricks can assume the cross-account role.
  5. In the role summary, copy the Role ARN to paste into the credential configuration step.

Step 3: Create the credential configuration

The credential configuration is a Databricks configuration object that represents the IAM role that you created in the previous step.

To create a credential configuration:

  1. When creating the new workspace, in the Cloud credential dropdown menu, select Add cloud credential.
  2. Select Add manually.
  3. In the Cloud credential name field, enter a human-readable name for your new credential configuration.
  4. In the Role ARN field, paste the role ARN that you copied in the previous step.
  5. Click OK.

Databricks validates the credential configuration during this step. Possible errors can include an invalid ARN or incorrect permissions for the role, among others.

Create a storage configuration

In the storage configuration step, you create a storage bucket to store your Databricks workspace assets such as data, libraries, and logs. This is also where the workspace's default catalog is stored. As part of the storage configuration, you also create an IAM role that Databricks uses to access the storage location.

Step 1: Create an S3 bucket

  1. Log into your AWS Console as a user with administrator privileges and go to the S3 service.

  2. Click the Create bucket button.

  3. Enter a name for the bucket.

  4. Select the AWS region that you will use for your Databricks workspace deployment.

  5. Click Create bucket.

  6. Click the Permissions tab.

  7. In the Bucket policy section, click Edit.

  8. Add the following bucket policy, replacing <BUCKET-NAME> with your bucket's name and <YOUR-DATABRICKS-ACCOUNT-ID> with your Databricks account ID.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "Grant Databricks Access",
    "Effect": "Allow",
    "Principal": {
    "AWS": "arn:aws:iam::414351767826:root"
    },
    "Action": [
    "s3:GetObject",
    "s3:GetObjectVersion",
    "s3:PutObject",
    "s3:DeleteObject",
    "s3:ListBucket",
    "s3:GetBucketLocation"
    ],
    "Resource": ["arn:aws:s3:::<BUCKET-NAME>/*", "arn:aws:s3:::<BUCKET-NAME>"],
    "Condition": {
    "StringEquals": {
    "aws:PrincipalTag/DatabricksAccountId": ["<YOUR-DATABRICKS-ACCOUNT-ID>"]
    }
    }
    },
    {
    "Sid": "Prevent DBFS from accessing Unity Catalog metastore",
    "Effect": "Deny",
    "Principal": {
    "AWS": "arn:aws:iam::414351767826:root"
    },
    "Action": ["s3:*"],
    "Resource": ["arn:aws:s3:::<BUCKET-NAME>/unity-catalog/*"]
    }
    ]
    }
  9. Save the bucket.

Step 2: Create an IAM role with a custom trust policy

This IAM role and trust policy establishes a cross-account trust relationship so that Databricks can access data in the S3 bucket on behalf of Databricks users. The ARN in the Principal section is a static value that references a role created by Databricks. The ARN is slightly different if you use Databricks on AWS GovCloud.

  1. In your AWS account, create an IAM role with a Custom Trust Policy.

  2. In the Custom Trust Policy field, paste the following policy JSON:

    The policy sets the external ID to 0000 as a placeholder. You update this to the account ID of your Databricks account in a later step.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Principal": {
    "AWS": ["arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL"]
    },
    "Action": "sts:AssumeRole",
    "Condition": {
    "StringEquals": {
    "sts:ExternalId": "0000"
    }
    }
    }
    ]
    }
  3. Save the IAM role.

    Now that you have created the role, you must update its trust policy to make it self-assuming.

  4. In the IAM role you just created, go to the Trust Relationships tab and edit the trust relationship policy as follows, replacing the <YOUR-AWS-ACCOUNT-ID>, <THIS-ROLE-NAME>, and <YOUR-DATABRICKS-ACCOUNT-ID> values.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Principal": {
    "AWS": [
    "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL",
    "arn:aws:iam::<YOUR-AWS-ACCOUNT-ID>:role/<THIS-ROLE-NAME>"
    ]
    },
    "Action": "sts:AssumeRole",
    "Condition": {
    "StringEquals": {
    "sts:ExternalId": "<YOUR-DATABRICKS-ACCOUNT-ID>"
    }
    }
    }
    ]
    }
  5. Skip the permissions policy configuration. You'll go back to add that in a later step.

  6. Copy the IAM role ARN, which you'll paste into the storage configuration step.

Step 3: Create an IAM policy to grant read and write access

  1. Create an IAM policy in the same account as the S3 bucket, replacing the following values:

    • <BUCKET>: The name of the S3 bucket.
    • <AWS-ACCOUNT-ID>: The Account ID of your AWS account (not your Databricks account).
    • <AWS-IAM-ROLE-NAME>: The name of the AWS IAM role that you created in the previous step.
    • <KMS-KEY> (Optional): If encryption is enabled, provide the name of the KMS key that encrypts the S3 bucket contents. If encryption is disabled, remove the entire KMS section of the IAM policy.

    This IAM policy grants read and write access. You can also create a policy that grants read access only. However, this may be unnecessary, because you can mark the storage credential as read-only, and any write access granted by this IAM role will be ignored.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
    "Resource": "arn:aws:s3:::<BUCKET>/unity-catalog/*"
    },
    {
    "Effect": "Allow",
    "Action": ["s3:ListBucket", "s3:GetBucketLocation"],
    "Resource": "arn:aws:s3:::<BUCKET>"
    },
    {
    "Action": ["kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*"],
    "Resource": ["arn:aws:kms:<KMS-KEY>"],
    "Effect": "Allow"
    },
    {
    "Action": ["sts:AssumeRole"],
    "Resource": ["arn:aws:iam::<AWS-ACCOUNT-ID>:role/<AWS-IAM-ROLE-NAME>"],
    "Effect": "Allow"
    }
    ]
    }
    note

    If you need a more restrictive IAM policy for Unity Catalog, contact your Databricks account team for assistance.

  2. Create a separate IAM policy for file events in the same account as the S3 bucket.

    note

    This step is optional but highly recommended. If you do not grant Databricks access to configure file events on your behalf, you must configure file events manually for each location. If you do not, you will have limited access to critical features that Databricks releases in the future.

    The IAM policy grants Databricks permission to update your bucket's event notification configuration, create an SNS topic, create an SQS queue, and subscribe the SQS queue to the SNS topic. These are required resources for features that use file events. Replace <BUCKET> with the name of the S3 bucket.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "ManagedFileEventsSetupStatement",
    "Effect": "Allow",
    "Action": [
    "s3:GetBucketNotification",
    "s3:PutBucketNotification",
    "sns:ListSubscriptionsByTopic",
    "sns:GetTopicAttributes",
    "sns:SetTopicAttributes",
    "sns:CreateTopic",
    "sns:TagResource",
    "sns:Publish",
    "sns:Subscribe",
    "sqs:CreateQueue",
    "sqs:DeleteMessage",
    "sqs:ReceiveMessage",
    "sqs:SendMessage",
    "sqs:GetQueueUrl",
    "sqs:GetQueueAttributes",
    "sqs:SetQueueAttributes",
    "sqs:TagQueue",
    "sqs:ChangeMessageVisibility",
    "sqs:PurgeQueue"
    ],
    "Resource": ["arn:aws:s3:::<BUCKET>", "arn:aws:sqs:*:*:*", "arn:aws:sns:*:*:*"]
    },
    {
    "Sid": "ManagedFileEventsListStatement",
    "Effect": "Allow",
    "Action": ["sqs:ListQueues", "sqs:ListQueueTags", "sns:ListTopics"],
    "Resource": "*"
    },
    {
    "Sid": "ManagedFileEventsTeardownStatement",
    "Effect": "Allow",
    "Action": ["sns:Unsubscribe", "sns:DeleteTopic", "sqs:DeleteQueue"],
    "Resource": ["arn:aws:sqs:*:*:*", "arn:aws:sns:*:*:*"]
    }
    ]
    }
  3. Return to the IAM role that you created in Step 2.

  4. In the Permission tab, attach the IAM policies that you just created.

Step 4: Create the storage configuration

Now, return to the workspace creation flow so you can manually create the storage configuration in Databricks:

  1. In the Cloud storage dropdown menu, select Add new cloud storage.
  2. Select Add manually.
  3. In the Storage configuration name field, enter a human-readable name for the storage configuration.
  4. In the Bucket name field, enter the name of the S3 bucket you created in your AWS account.
  5. In the IAM role ARN field, paste the ARN of the IAM role you created in step 2.
  6. Click OK.

Advanced configurations

The following configurations are optional when you create a new workspace. To view these settings, click the Advanced dropdown in the workspace creation page.

  • Encryption: You can add encryption keys to your workspace deployment for managed services and workspace storage. The key for managed services encrypts notebooks, secrets, and Databricks SQL query data in the control plane. The key for workspace storage encrypts your workspace storage bucket and the EBS volumes of compute resources in the classic compute plane. For more guidance, see Configure customer-managed keys for encryption.
  • Security and compliance: These checkboxes allow you to enable the compliance security profile, add compliance standards, and enable enhanced security monitoring for your workspace. For more information, see Configure enhanced security and compliance settings.

View workspace status

After you create a workspace, you can view its status on the Workspaces page.

  • Provisioning: In progress. Wait a few minutes and refresh the page.
  • Running: Successful workspace deployment.
  • Failed: Failed deployment.
  • Banned: Contact your Databricks account team.
  • Cancelling: In the process of cancellation.

Log into a workspace

  1. Go to the account console and click the Workspaces icon.
  2. On the row with your workspace, click Open.
  3. To log in as a workspace administrator, log in with your account owner or account administrator email address and password. If you configured single-sign on, click the Single Sign On button.

Next steps

Now that you have deployed a workspace, you can start building out your data strategy. Databricks recommends the following articles: