Skip to main content

Create a workspace with manual AWS configurations

This page explains how to deploy a classic workspace using manually created AWS resources. Use this method if you want to create your own AWS resources or need to deploy a workspace with custom configurations such as your own VPC, specific IAM policies, or pre-existing S3 buckets.

For most deployments, Databricks recommends using automated configuration, which uses AWS IAM temporary delegation to automatically provision all required resources.

Requirements

To create a workspace with manual configuration, you must:

  • Be an account admin in your Databricks account.
  • Have permissions to provision IAM roles, S3 buckets, and access policies in your AWS account.
  • Have an available VPC and NAT gateway in your AWS account in the workspace's region. You can view your available quotas and request increases using the AWS Service Quotas console.
  • Have the STS endpoint activated for us-west-2. For details, see the AWS documentation.

Create a Databricks workspace with manual AWS configurations

To create a workspace with manually configured AWS resources:

  1. Go to the account console and click the Workspaces icon.
  2. Click Create Workspace.
  3. In the Workspace name field, enter a human-readable name for this workspace. It can contain spaces.
  4. In the Region field, select an AWS region for your workspace's network and compute resources.
  5. In the Cloud credentials dropdown, select or create a credential configuration. If you create a new credential configuration, see Create a credential configuration.
  6. In the Cloud storage dropdown, select or create the storage configuration you'll use for this workspace. If you create a new storage configuration, see Create a storage configuration.
  7. (Optional) Set up any Advanced configurations. See Advanced configurations.
  8. Click Next.
  9. Review your workspace details and click Create workspace.

Create a credential configuration

The credential configuration gives Databricks access to launch compute resources in your AWS account. This step requires you to create a new cross-account IAM role with an access policy.

Step 1: Create a cross-account IAM role

  1. Get your Databricks account ID. See Locate your account ID.
  2. Log into your AWS Console as a user with administrator privileges and go to the IAM console.
  3. Click the Roles tab in the sidebar.
  4. Click Create role.
    1. In Select type of trusted entity, click the AWS account tile.
    2. Select the Another AWS account checkbox.
    3. In the Account ID field, enter the Databricks account ID 414351767826. This is not the Account ID you copied from the Databricks account console. If you are are using Databricks on AWS GovCloud use the Databricks account ID 044793339203 for AWS GovCloud or 170661010020 for AWS GovCloud DoD.
    4. Select the Require external ID checkbox.
    5. In the External ID field, enter your Databricks account ID, which you copied from the Databricks account console.
    6. Click the Next button.
    7. On the Add Permissions page, click the Next button. You should now be on the Name, review, and create page.
    8. In the Role name field, enter a role name.
    9. Click Create role. The list of roles appears.

Step 2: Create an access policy

The access policy you add to the role depends on your Amazon VPC (Virtual Private Cloud) deployment type. For information about how Databricks uses each permission, see IAM permissions for Databricks-managed VPCs.

  1. In the Roles section of the IAM console, click the IAM role you created in Step 1.
  2. Click the Add permissions drop-down and select Create inline policy.
  3. In the policy editor, click the JSON tab.
  4. Copy and paste the appropriate access policy for your deployment type:

A single VPC that Databricks creates and configures in your AWS account.

JSON
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1403287045000",
"Effect": "Allow",
"Action": [
"ec2:AllocateAddress",
"ec2:AssignPrivateIpAddresses",
"ec2:AssociateDhcpOptions",
"ec2:AssociateIamInstanceProfile",
"ec2:AssociateRouteTable",
"ec2:AttachInternetGateway",
"ec2:AttachVolume",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CancelSpotInstanceRequests",
"ec2:CreateDhcpOptions",
"ec2:CreateFleet",
"ec2:CreateInternetGateway",
"ec2:CreateLaunchTemplate",
"ec2:CreateLaunchTemplateVersion",
"ec2:CreateNatGateway",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:CreateVpc",
"ec2:CreateVpcEndpoint",
"ec2:DeleteDhcpOptions",
"ec2:DeleteFleets",
"ec2:DeleteInternetGateway",
"ec2:DeleteLaunchTemplate",
"ec2:DeleteLaunchTemplateVersions",
"ec2:DeleteNatGateway",
"ec2:DeleteRoute",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteTags",
"ec2:DeleteVolume",
"ec2:DeleteVpc",
"ec2:DeleteVpcEndpoints",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeFleetHistory",
"ec2:DescribeFleetInstances",
"ec2:DescribeFleets",
"ec2:DescribeIamInstanceProfileAssociations",
"ec2:DescribeInstanceStatus",
"ec2:DescribeInstances",
"ec2:DescribeInternetGateways",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeNatGateways",
"ec2:DescribePrefixLists",
"ec2:DescribeReservedInstancesOfferings",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSpotInstanceRequests",
"ec2:DescribeSpotPriceHistory",
"ec2:DescribeSubnets",
"ec2:DescribeVolumes",
"ec2:DescribeVpcs",
"ec2:DetachInternetGateway",
"ec2:DisassociateIamInstanceProfile",
"ec2:DisassociateRouteTable",
"ec2:GetLaunchTemplateData",
"ec2:GetSpotPlacementScores",
"ec2:ModifyFleet",
"ec2:ModifyLaunchTemplate",
"ec2:ModifyVpcAttribute",
"ec2:ReleaseAddress",
"ec2:ReplaceIamInstanceProfileAssociation",
"ec2:RequestSpotInstances",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RunInstances",
"ec2:TerminateInstances"
],
"Resource": ["*"]
},
{
"Effect": "Allow",
"Action": ["iam:CreateServiceLinkedRole", "iam:PutRolePolicy"],
"Resource": "arn:aws:iam::*:role/aws-service-role/spot.amazonaws.com/AWSServiceRoleForEC2Spot",
"Condition": {
"StringLike": {
"iam:AWSServiceName": "spot.amazonaws.com"
}
}
}
]
}
  1. Click Review policy.
  2. In the Name field, enter a policy name.
  3. Click Create policy.
  4. (Optional) If you use Service Control Policies to deny certain actions at the AWS account level, ensure that sts:AssumeRole is allowlisted so Databricks can assume the cross-account role.
  5. In the role summary, copy the Role ARN to paste into the credential configuration step.

Step 3: Create the credential configuration

The credential configuration is a Databricks configuration object that represents the IAM role that you created in the previous step.

To create a credential configuration:

  1. When creating the new workspace, in the Cloud credential dropdown menu, select Add cloud credential.
  2. Select Add manually.
  3. In the Cloud credential name field, enter a human-readable name for your new credential configuration.
  4. In the Role ARN field, paste the role ARN that you copied in the previous step.
  5. Click OK.

Databricks validates the credential configuration during this step. Possible errors can include an invalid ARN or incorrect permissions for the role, among others.

Create a storage configuration

In the storage configuration step, you create a storage bucket to store your Databricks workspace assets such as data, libraries, and logs. This is also where the workspace's default catalog is stored. As part of the storage configuration, you also create an IAM role that Databricks uses to access the storage location.

Step 1: Create an S3 bucket

  1. Log into your AWS Console as a user with administrator privileges and go to the S3 service.

  2. Click the Create bucket button.

  3. Enter a name for the bucket.

  4. Select the AWS region that you will use for your Databricks workspace deployment.

  5. Click Create bucket.

  6. Click the Permissions tab.

  7. In the Bucket policy section, click Edit.

  8. Add the following bucket policy, replacing <BUCKET-NAME> with your bucket's name and <YOUR-DATABRICKS-ACCOUNT-ID> with your Databricks account ID.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "Grant Databricks Access",
    "Effect": "Allow",
    "Principal": {
    "AWS": "arn:aws:iam::414351767826:root"
    },
    "Action": [
    "s3:GetObject",
    "s3:GetObjectVersion",
    "s3:PutObject",
    "s3:DeleteObject",
    "s3:ListBucket",
    "s3:GetBucketLocation"
    ],
    "Resource": ["arn:aws:s3:::<BUCKET-NAME>/*", "arn:aws:s3:::<BUCKET-NAME>"],
    "Condition": {
    "StringEquals": {
    "aws:PrincipalTag/DatabricksAccountId": ["<YOUR-DATABRICKS-ACCOUNT-ID>"]
    }
    }
    },
    {
    "Sid": "Prevent DBFS from accessing Unity Catalog metastore",
    "Effect": "Deny",
    "Principal": {
    "AWS": "arn:aws:iam::414351767826:root"
    },
    "Action": ["s3:*"],
    "Resource": ["arn:aws:s3:::<BUCKET-NAME>/unity-catalog/*"]
    }
    ]
    }
  9. Save the bucket.

Step 2: Create an IAM role with a custom trust policy

This IAM role and trust policy establishes a cross-account trust relationship so that Databricks can access data in the S3 bucket on behalf of Databricks users. The ARN in the Principal section is a static value that references a role created by Databricks. The ARN is slightly different if you use Databricks on AWS GovCloud.

  1. In your AWS account, create an IAM role with a Custom Trust Policy.

  2. In the Custom Trust Policy field, paste the following policy JSON:

    The policy sets the external ID to 0000 as a placeholder. You update this to the account ID of your Databricks account in a later step.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Principal": {
    "AWS": ["arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL"]
    },
    "Action": "sts:AssumeRole",
    "Condition": {
    "StringEquals": {
    "sts:ExternalId": "0000"
    }
    }
    }
    ]
    }
  3. Save the IAM role.

    Now that you have created the role, you must update its trust policy to make it self-assuming.

  4. In the IAM role you just created, go to the Trust Relationships tab and edit the trust relationship policy as follows, replacing the <YOUR-AWS-ACCOUNT-ID>, <THIS-ROLE-NAME>, and <YOUR-DATABRICKS-ACCOUNT-ID> values.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Principal": {
    "AWS": [
    "arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL",
    "arn:aws:iam::<YOUR-AWS-ACCOUNT-ID>:role/<THIS-ROLE-NAME>"
    ]
    },
    "Action": "sts:AssumeRole",
    "Condition": {
    "StringEquals": {
    "sts:ExternalId": "<YOUR-DATABRICKS-ACCOUNT-ID>"
    }
    }
    }
    ]
    }
  5. Skip the permissions policy configuration. You'll go back to add that in a later step.

  6. Copy the IAM role ARN, which you'll paste into the storage configuration step.

Step 3: Create an IAM policy to grant read and write access

  1. Create an IAM policy in the same account as the S3 bucket, replacing the following values:

    • <BUCKET>: The name of the S3 bucket.
    • <AWS-ACCOUNT-ID>: The Account ID of your AWS account (not your Databricks account).
    • <AWS-IAM-ROLE-NAME>: The name of the AWS IAM role that you created in the previous step.
    • <KMS-KEY> (Optional): If encryption is enabled, provide the name of the KMS key that encrypts the S3 bucket contents. If encryption is disabled, remove the entire KMS section of the IAM policy.

    This IAM policy grants read and write access. You can also create a policy that grants read access only. However, this may be unnecessary, because you can mark the storage credential as read-only, and any write access granted by this IAM role will be ignored.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Effect": "Allow",
    "Action": ["s3:GetObject", "s3:PutObject", "s3:DeleteObject"],
    "Resource": "arn:aws:s3:::<BUCKET>/unity-catalog/*"
    },
    {
    "Effect": "Allow",
    "Action": ["s3:ListBucket", "s3:GetBucketLocation"],
    "Resource": "arn:aws:s3:::<BUCKET>"
    },
    {
    "Action": ["kms:Decrypt", "kms:Encrypt", "kms:GenerateDataKey*"],
    "Resource": ["arn:aws:kms:<KMS-KEY>"],
    "Effect": "Allow"
    },
    {
    "Action": ["sts:AssumeRole"],
    "Resource": ["arn:aws:iam::<AWS-ACCOUNT-ID>:role/<AWS-IAM-ROLE-NAME>"],
    "Effect": "Allow"
    }
    ]
    }
    note

    If you need a more restrictive IAM policy for Unity Catalog, contact your Databricks account team for assistance.

  2. Create a separate IAM policy for file events in the same account as the S3 bucket.

    note

    This step is optional but highly recommended. If you do not grant Databricks access to configure file events on your behalf, you must configure file events manually for each location. If you do not, you will have limited access to critical features that Databricks releases in the future.

    The IAM policy grants Databricks permission to update your bucket's event notification configuration, create an SNS topic, create an SQS queue, and subscribe the SQS queue to the SNS topic. These are required resources for features that use file events. Replace <BUCKET> with the name of the S3 bucket.

    JSON
    {
    "Version": "2012-10-17",
    "Statement": [
    {
    "Sid": "ManagedFileEventsSetupStatement",
    "Effect": "Allow",
    "Action": [
    "s3:GetBucketNotification",
    "s3:PutBucketNotification",
    "sns:ListSubscriptionsByTopic",
    "sns:GetTopicAttributes",
    "sns:SetTopicAttributes",
    "sns:CreateTopic",
    "sns:TagResource",
    "sns:Publish",
    "sns:Subscribe",
    "sqs:CreateQueue",
    "sqs:DeleteMessage",
    "sqs:ReceiveMessage",
    "sqs:SendMessage",
    "sqs:GetQueueUrl",
    "sqs:GetQueueAttributes",
    "sqs:SetQueueAttributes",
    "sqs:TagQueue",
    "sqs:ChangeMessageVisibility",
    "sqs:PurgeQueue"
    ],
    "Resource": ["arn:aws:s3:::<BUCKET>", "arn:aws:sqs:*:*:*", "arn:aws:sns:*:*:*"]
    },
    {
    "Sid": "ManagedFileEventsListStatement",
    "Effect": "Allow",
    "Action": ["sqs:ListQueues", "sqs:ListQueueTags", "sns:ListTopics"],
    "Resource": "*"
    },
    {
    "Sid": "ManagedFileEventsTeardownStatement",
    "Effect": "Allow",
    "Action": ["sns:Unsubscribe", "sns:DeleteTopic", "sqs:DeleteQueue"],
    "Resource": ["arn:aws:sqs:*:*:*", "arn:aws:sns:*:*:*"]
    }
    ]
    }
  3. Return to the IAM role that you created in Step 2.

  4. In the Permission tab, attach the IAM policies that you just created.

Step 4: Create the storage configuration

Now, return to the workspace creation flow so you can manually create the storage configuration in Databricks:

  1. In the Cloud storage dropdown menu, select Add new cloud storage.
  2. Select Add manually.
  3. In the Storage configuration name field, enter a human-readable name for the storage configuration.
  4. In the Bucket name field, enter the name of the S3 bucket you created in your AWS account.
  5. In the IAM role ARN field, paste the ARN of the IAM role you created in step 2.
  6. Click OK.

Advanced configurations

The following configurations are optional when you create a new workspace. To view these settings, click the Advanced configurations dropdown in the Credentials step.

  • Metastore: Confirm the metastore assignment for your workspace. The metastore is automatically selected if a Unity Catalog metastore already exists in the workspace's region and the metastore is configured to be automatically assigned to new workspaces. If this is the first workspace you are deploying in a region, the metastore is created automatically. Metastores are created without metastore-level storage by default. If you want metastore-level storage, you can add it. See Add managed storage to an existing metastore.
  • Network configuration: To create the workspace in your own VPC, select or add a Network configuration. For instructions on configuring your own VPC, see Configure a customer-managed VPC. If you are using a customer-managed VPC, ensure your IAM role uses an access policy that supports customer-managed VPCs.
  • Private Link: To enable PrivateLink, select or add a private access setting. This requires that you create the correct regional VPC endpoints, register them, and reference them from your network configuration.
  • Customer-managed keys: You can add encryption keys to your workspace deployment for managed services and workspace storage. The key for managed services encrypts notebooks, secrets, and Databricks SQL query data in the control plane. The key for workspace storage encrypts your workspace storage bucket and the EBS volumes of compute resources in the classic compute plane. For more guidance, see Configure customer-managed keys for encryption.
  • Security and compliance: These checkboxes allow you to enable the compliance security profile, add compliance standards, and enable enhanced security monitoring for your workspace. For more information, see Configure enhanced security and compliance settings.

View workspace status

After you create a workspace, you can view its status on the Workspaces page.

  • Provisioning: In progress. Wait a few minutes and refresh the page.
  • Running: Successful workspace deployment.
  • Failed: Failed deployment.
  • Banned: Contact your Databricks account team.
  • Cancelling: In the process of cancellation.

Log into a workspace

  1. Go to the account console and click the Workspaces icon.
  2. On the row with your workspace, click Open.
  3. To log in as a workspace administrator, log in with your account owner or account administrator email address and password. If you configured single-sign on, click the Single Sign On button.

Next steps

Now that you have deployed a workspace, you can start building out your data strategy. Databricks recommends the following articles: