Create a workspace using the Account API
Note
These instructions apply to accounts created before November 8, 2023. If your Databricks account was created after November 8, 2023, see Create workspace API.
This article teaches you how to create a workspace using the Account API. You can also create workspaces using the AWS Quick Start template, the account console, or Terraform.
You must use the Account API to create workspaces if you want to use customer-managed keys for managed services or AWS PrivateLink.
Before you begin
Ensure you have access to your account ID.
Determine if your workspace will enable the following features:
Customer-managed VPC: Provide your own Amazon Virtual Public Cloud (VPC) if you want to use AWS PrivateLink for any type of connection.
Customer-managed keys for encryption:
Customer-managed keys for managed services in the control plane: Provide KMS keys to encrypt notebook and secret data in the Databricks-managed control plane.
Customer-managed keys for workspace storage: Provide KMS keys to encrypt your workspace’s S3 bucket (the workspace’s DBFS root, job results, and more) and optionally cluster node EBS volumes.
AWS PrivateLink: AWS PrivateLink provides private connectivity from AWS VPCs and on-premises networks to AWS services without exposing the traffic to the public network.
Determine the regions to use for your workspace’s compute plane (VPC). The control plane region is determined by the compute plane region.
Workspace compute plane VPCs can be in AWS regions
ap-northeast-1
,ap-northeast-2
,ap-south-1
,ap-southeast-1
,ap-southeast-2
,ca-central-1
,eu-west-1
,eu-west-2
,eu-central-1
,us-east-1
,us-east-2
,us-west-1
, andus-west-2
. However, you cannot use a VPC inus-west-1
if you want to use customer-managed keys for encryption.
How to use the Account API
To authenticate to the Account API, you can use Databricks OAuth for service principals or Databricks OAuth for users. Databricks strongly recommends that you use Databricks OAuth for users or service principals. A service principal is an identity that you create in Databricks for use with automated tools, jobs, and applications. See Authenticate access to Databricks with a service principal using OAuth (OAuth M2M).
Use the following examples to authenticate to a Databricks account. You can use OAuth for service principals or OAuth for users. For background, see:
For OAuth for service principals, see Authenticate access to Databricks with a service principal using OAuth (OAuth M2M).
For OAuth for users, see Authenticate access to Databricks with a user account using OAuth (OAuth U2M).
Note
Basic authentication using a Databricks username and password reached end of life on July 10, 2024. See End of life for Databricks-managed passwords.
For authentication examples, choose from the following:
Install Databricks CLI version 0.205 or above. See Install or update the Databricks CLI.
Complete the steps to configure OAuth M2M authentication for service principals in the account. See Authenticate access to Databricks with a service principal using OAuth (OAuth M2M).
Identify or manually create a Databricks configuration profile in your
.databrickscfg
file, with the profile’s fields set correctly for the relatedhost
,account_id
, andclient_id
andclient_secret
mapping to the service principal. See OAuth machine-to-machine (M2M) authentication.Run your target Databricks CLI command, where
<profile-name>
represents the name of the configuration profile in your.databrickscfg
file:databricks account <command-name> <subcommand-name> -p <profile-name>
For example, to list all users in the account:
databricks account users list -p MY-AWS-ACCOUNT
For a list of available account commands, run the command
databricks account -h
.For a list of available subcommands for an account command, run the command
databricks account <command-name> -h
.
Install Databricks CLI version 0.205 or above. See Install or update the Databricks CLI.
Complete the steps to configure OAuth U2M authentication for users in the account. See Authenticate access to Databricks with a user account using OAuth (OAuth U2M).
Start the user authentication process by running the following Databricks CLI command:
databricks auth login --host <account-console-url> --account-id <account-id>
For example:
databricks auth login --host https://accounts.cloud.databricks.com --account-id 00000000-0000-0000-0000-000000000000
Note
If you have an existing Databricks configuration profile with the
host
andaccount_id
fields already set, you can substitute--host <account-console-url> --account-id <account-id>
with--profile <profile-name>
.Follow the on-screen instructions to have the Databricks CLI automatically create the related Databricks configuration profile in your
.databrickscfg
file.Continue following the on-screen instructions to sign in to your Databricks account through your web browser.
Run your target Databricks CLI command, where
<profile-name>
represents the name of the configuration profile in your.databrickscfg
file:databricks account <command-name> <subcommand-name> -p <profile-name>
For example, to list all users in the account:
databricks account users list -p ACCOUNT-00000000-0000-0000-0000-000000000000
For a list of available account commands, run the command
databricks account -h
.For a list of available subcommands for an account command, run the command
databricks account <command-name> -h
.
Step 1: Configure cross-account authentication
Databricks needs access to a cross-account service IAM role in your AWS account so that Databricks can deploy clusters in the appropriate VPC for the new workspace.
If such a role does not yet exist, see Create an IAM role for workspace deployment to create an appropriate role and policy for your deployment type. You provide the ARN for your new role (the
role_arn
) later in this procedure.Note
You can share a cross-account IAM role with multiple workspaces. You are not required to create a new cross-account IAM role for each workspace. If you already have a cross-account IAM role, you can skip this step.
Create a Databricks credential configuration ID for your AWS role. Call the Create credential configuration API (
POST /accounts/<databricks-account-id>/credentials
). This request establishes cross-account trust and returns a reference ID to use when you create a new workspace.Note
You can share a credentials configuration ID with multiple workspaces. It is not required to create a new one for each workspace. If you already have one, you can skip this step.
Replace
<databricks-account-id>
with your Databricks account ID. For authentication, see How to use the Account API. In the request body:Set
credentials_name
to a name for these credentials. The name must be unique within your account.Set
aws_credentials
to an object that contains ansts_role
property. That object must contain arole_arn
property that specifies the AWS role ARN for the role you’ve created.
The response body will include a
credentials_id
field, which is the Databricks credentials configuration ID that you need to create the new workspace. Copy and save this value, which you will use in a later step to create the workspace.For example:
curl -X POST 'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/credentials' \ --header 'Authorization: Bearer $OAUTH_TOKEN' \ -d '{ "credentials_name": "databricks-workspace-credentials-v1", "aws_credentials": { "sts_role": { "role_arn": "arn:aws:iam::<aws-account-id>:role/my-company-example-role" } } }'
Example response:
{ "credentials_id": "<databricks-credentials-id>", "account_id": "<databricks-account-id>", "aws_credentials": { "sts_role": { "role_arn": "arn:aws:iam::<aws-account-id>:role/my-company-example-role", "external_id": "<databricks-account-id>" } }, "credentials_name": "databricks-workspace-credentials-v1", "creation_time": 1579753556257 }
Copy the
credentials_id
field from the response for later use.
Step 2: Configure root storage
The root storage S3 bucket in your account stores objects like cluster logs, notebook revisions, and job results. You can also use the root storage S3 bucket to store non-production data, like data you need for testing.
Note
You can share a root S3 bucket with multiple workspaces in a single account. You do not have to create new buckets for each workspace. If you share a root S3 bucket for multiple workspaces in an account, data on the root S3 bucket is partitioned into separate directories by workspace. If you already have a bucket and an associated storage configuration ID generated by the Account API, you can skip this step. However, do not reuse a bucket from legacy workspaces. For example, if you are migrating to E2, create a new AWS bucket for your E2 setup.
Create the root S3 bucket using the instructions in Create an S3 bucket for workspace deployment.
Create a storage configuration record that represents the root S3 bucket. Specify your root S3 bucket by name by calling the create storage configuration API (
POST /accounts/<account-id>/storage-configurations
).The request returns a storage configuration ID that represents your S3 bucket.
Pass the following:
storage_configuration_name
: New unique storage configuration name.root_bucket_info
: A JSON object that contains abucket_name
field that contains your S3 bucket name.
The response body includes a
storage_configuration_id
property, which is that bucket’s storage configuration ID. Copy that value for later use.For example:
curl -X POST 'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/storage-configurations' \ --header 'Authorization: Bearer $OAUTH_TOKEN' \ -d '{ "storage_configuration_name": "databricks-workspace-storageconf-v1", "root_bucket_info": { "bucket_name": "my-company-example-bucket" } }'
Response:
{ "storage_configuration_id": "<databricks-storage-config-id>", "account_id": "<databricks-account-id>", "root_bucket_info": { "bucket_name": "my-company-example-bucket" }, "storage_configuration_name": "databricks-workspace-storageconf-v1", "creation_time": 1579754875555 }
Step 3: Configure PrivateLink (optional)
This step is necessary only if you want to use AWS PrivateLink.
AWS PrivateLink provides private connectivity from your AWS VPC and on-premises networks to AWS services without exposing the traffic to the public network.
Databricks workspaces support adding PrivateLink connections for two connection types:
User to workspace (front-end)
Compute plane to control plane (back-end)
For PrivateLink connections for a new workspace:
Carefully read the article AWS PrivateLink and confirm the prerequisites before proceeding.
Create your AWS VPC endpoints in AWS console or with automation tools. See Step 2: Create VPC endpoints.
Review Enable private connectivity using AWS PrivateLink to create VPC endpoint registrations, network configurations, and private access settings objects.
Continue to the next step in this article. If you want to implement any type of PrivateLink connection (including front-end only), you must use a customer-managed VPC.
Step 4: Configure customer-managed VPC (optional, but required if you use PrivateLink)
By default, Databricks creates a VPC in your AWS account for each workspace. Databricks uses it for running clusters in the workspace. Optionally, you can use your own VPC for the workspace, using the feature customer-managed VPC. Databricks recommends that you provide your own VPC so that you can configure it according to your organization’s enterprise cloud standards while still conforming to Databricks requirements. You cannot migrate an existing workspace to your own VPC.
Important
To configure your workspace to use Enable private connectivity using AWS PrivateLink for any type of connection (including front-end only), your workspace must use a customer-managed VPC.
Set up your VPC, subnets, and security groups, using the instructions in Configure a customer-managed VPC. Copy the IDs for each of those objects for the next step, in which you register them with Databricks and get a network ID to represent your new network.
Important
If you plan to share a VPC and subnets across multiple workspaces, be sure to size your VPC and subnets to be large enough to scale with usage. You cannot reuse a network configuration object across workspaces.
To register your network configuration with Databricks, call the create network configuration API (
POST /accounts/<account-id>/networks
).Pass the following:
network_name
: New unique network name.vpc_id
: VPC ID.subnet_ids
: Subnet IDs, as an array.security_group_ids
: Security Group IDs, as an array.vpc_endpoints
: Used only for AWS PrivateLink. Required if you are deploying a back-end (compute plane to control plane) PrivateLink connection, in which case this object must have two properties that reference registered VPC endpoint registrations.rest_api
: Set this to a JSON array containing only the Databricks ID for your workspace VPC endpoint registration. This is the Databricks VPC endpoint registration ID, not the AWS VPC endpoint ID.Important
In this release, after you register a VPC endpoint to the workspace VPC endpoint service for either a front-end connection or back-end REST API connection for any workspace, Databricks enables front-end (web application and REST API) access from that VPC endpoint to all PrivateLink-enabled workspaces in your Databricks account in that AWS region.
dataplane_relay
: Set this to a JSON array containing only the Databricks ID for your secure cluster connectivity VPC endpoint registration. This is the Databricks VPC endpoint registration ID, not the AWS VPC endpoint ID.
For more details on these objects, see Enable private connectivity using AWS PrivateLink. These IDs were returned during VPC endpoint registration, as described in Enable private connectivity using AWS PrivateLink.
For example:
curl -X POST 'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/networks' \ --header 'Authorization: Bearer $OAUTH_TOKEN' \ -d '{ "network_name": "mycompany-vpc-example", "vpc_id": "<aws-vpc-id>", "subnet_ids": [ "<aws-subnet-id-1>", "<aws-subnet-id-2>" ], "security_group_ids": [ "<aws-security-group-id>" ], "vpc_endpoints": { "dataplane_relay": [ "<databricks-vpce-id-for-scc>" ], "rest_api": [ "<databricks-vpce-id-for-rest-apis>" ] } }'
Copy the
network_id
from the response body for later use. This is the network ID that represents the network for your new workspace.Example response:
{ "network_id": "<databricks-network-id>", "account_id": "<databricks-account-id>", "vpc_id": "<aws-vpc-id>", "subnet_ids": [ "<aws-subnet-id-1>", "<aws-subnet-id-2>" ], "security_group_ids": [ "<aws-security-group-id>" ], "vpc_status": "UNATTACHED", "network_name": "mycompany-vpc-example", "creation_time": 1579767389544, "vpc_endpoints": { "dataplane_relay": [ "<databricks-vpce-id-for-scc>" ], "rest_api": [ "<databricks-vpce-id-for-rest-apis>" ] } }
Step 5: Configure customer-managed keys (optional)
Important
This feature requires that your account is on the Enterprise pricing tier.
Workspace compute plane VPCs can be in AWS regions
ap-northeast-1
,ap-northeast-2
,ap-south-1
,ap-southeast-1
,ap-southeast-2
,ca-central-1
,eu-west-1
,eu-west-2
,eu-central-1
,us-east-1
,us-east-2
,us-west-1
, andus-west-2
. However, you cannot use a VPC inus-west-1
if you want to use customer-managed keys for encryption.
There are two use cases for customer-managed encryption keys:
Encrypt managed services, which includes notebook and secret data in the control plane.
Encrypt workspace storage, which includes the workspace’s root S3 bucket and optionally cluster EBS volumes.
You can choose to configure neither, one, or both of these. If you choose to implement encryption for both uses cases, you can optionally share a key and optionally even the same configuration object for these uses cases.
For both use cases, you can configure it during workspace creation or add the key to a running workspace. For a customer-managed key for managed services you can rotate (update) the key later. For a customer-managed key for storage, you cannot rotate the key later.
You can share a customer-managed key or its key configuration object across workspaces. When creating a new workspace, a key configuration can represent both encryption use cases by setting its use_cases
field to include both enumeration values.
Note
To add a workspace storage key to an existing workspace that already uses notebook encryption you must create a new key configuration object for workspace storage. See Configure customer-managed keys for encryption.
To implement one encryption use case or both encryption use cases with the same key, perform the following procedure exactly once. To add encryption for both encryption use cases with different keys, perform the procedure two times, once for each use case.
Create the AWS KMS key. Follow the instructions in either of the following sections, which differ only in the human-readable description field (
sid
) in the policy to identify the use case. See Step 1: Create or select a key in AWS KMS. To share the key and configuration for both use cases, update thesid
field accordingly.To register your KMS key with Databricks, call the create customer-managed key configuration API (
POST /accounts/<account-id>/customer-managed-keys
).Pass the following parameters:
use_cases
— An array that specifies the uses cases for which to use the key, specify one or both of the following:MANAGED_SERVICES
: This key encrypts managed services in the control plane, which includes notebook and secret data in the control plane.STORAGE
: This key encrypts workspace storage, which includes the workspace’s DBFS root and cluster EBS volumes.
aws_key_info
: A JSON object with the following properties:key_arn
: AWS KMS key ARN. Note that Databricks infers the AWS region from the key ARN.key_alias
: (Optional) AWS KMS key alias.reuse_key_for_cluster_volumes
: (Optional) Used only if theuse_cases
array containsSTORAGE
, this specifies whether to also use the key to encrypt cluster EBS volumes. The default value istrue
, which means Databricks also uses the key for cluster volumes. If you set this tofalse
, Databricks does not encrypt the EBS volumes with your specified key. In that case, your Databricks EBS volumes are encrypted either with default AWS SSE encryption or if you enabled AWS account-level EBS encryption by default, AWS enforces account-level EBS encryption using a separate key that you provided to them. Note that ifreuse_key_for_cluster_volumes
istrue
and you revoke the permission for the key, it does not affect running clusters but affects new and restarted clusters.
Example request:
curl -X POST 'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/customer-managed-keys' \ --header 'Authorization: Bearer $OAUTH_TOKEN' \ -d '{ "use_cases": ["MANAGED_SERVICES", "STORAGE"], "aws_key_info": { "key_arn": "arn:aws:kms:us-west-2:<aws-account-id>:key/<key-id>", "key_alias": "my-example-key", "reuse_key_for_cluster_volumes": true } }'
Example response:
{ "use_cases": ["MANAGED_SERVICES", "STORAGE"], "customer_managed_key_id": "<aws-kms-key-id>", "creation_time": 1586447506984, "account_id": "<databricks-account-id>", "aws_key_info": { "key_arn": "arn:aws:kms:us-west-2:<aws-account-id>:key/<key-id>", "key_alias": "my-example-key", "reuse_key_for_cluster_volumes": true, "key_region": "us-west-2" } }
From the response JSON, copy the
customer_managed_key_id
. You use that ID in the next step to set your workspace configuration object’s propertymanaged_services_customer_managed_key_id
,storage_customer_managed_key_id
, or both, depending on which encryption use cases this object represents.
Step 6: Create the workspace
To create the new workspace, call the create workspace API (POST /accounts/<account-id>/workspaces
).
Pass the following parameters, which are values that you copied in previous steps:
aws_region
: The AWS region of the workspace’s compute plane.workspace_name
: Human-readable name for your workspace. This is the workspace name users see in the Databricks UI.deployment_name
: (Recommended but optional) Unique deployment name for your workspace. For details, see the notes about deployment name.credentials_id
: Your credential ID, which represents your cross-account role credentials. This is the ID from the credentials configuration object.storage_configuration_id
: Your storage configuration ID, which represents your root S3 bucket. This is the ID from the storage configuration object.network_id
: (Optional) only used for customer-managed VPC. This is the ID from the network configuration object.managed_services_customer_managed_key_id
: (Optional) Used only to encrypt managed services such as notebook and secret data in the control plane. See Customer-managed keys for managed services. This is your key configuration ID for workspace storage, which is thecustomer_managed_key_id
field from a key configuration object. If you want to support this encryption use case, you must configure it at workspace creation time.storage_customer_managed_key_id
: (Optional) Used only to encrypt workspace storage. This is your key configuration ID for workspace storage, which is thecustomer_managed_key_id
field from the key configuration object. If you want to support this encryption use case, you can configure it at workspace creation time, but you can also add it later to a running workspace.private_access_settings_id
: (Optional) Used only for AWS PrivateLink. This is the ID of the private access settings object that you created for this workspace. See Manage private access settings. This is a required field for PrivateLink access for all connection types (front-end, back-end, or both).custom_tags
: (Optional) Key-value pairs that act as metadata for organizing resources. Tags can help to manage, identify, organize, search, and filter resources, as well as monitor cost and attribute usage.
Notes about deployment name:
Choose your
deployment_name
value carefully. The deployment name defines part of the subdomain for the workspace. The workspace URL for web application and REST APIs is<deployment-name>.cloud.databricks.com
. For example, if the deployment name isABCSales
, your workspace URL will behttps://abcsales.cloud.databricks.com
. This property supports characters a-z and 0-9. Hyphens are also allowed but not as the first or last character.Accounts can have a deployment name prefix. Contact your Databricks account team to add an account deployment name prefix to your account. If your account has a non-empty deployment name prefix at workspace creation time, the workspace deployment name is updated so that it begins with the account prefix and a hyphen. For example, if your account’s deployment prefix is
acme
and the workspace deployment name isworkspace-1
, thedeployment_name
field becomesacme-workspace-1
. In this example, the workspace URL isacme-workspace-1.cloud.databricks.com
.After this modification with the account prefix, the new value is what is returned in JSON responses for this workspace’s
deployment_name
field.If your account has a non-empty deployment name prefix and you set
deployment_name
to the reserved keywordEMPTY
,deployment_name
is the account prefix only. For example, if your account’s deployment prefix isacme
and the workspace deployment name isEMPTY
,deployment_name
becomesacme
only, and the workspace URL isacme.cloud.databricks.com
. If your account does not yet have a deployment name prefix, the special deployment name valueEMPTY
is invalid.
The JSON response includes the property workspace_id
. Copy this value for later use.
For example:
curl -X POST
'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces' \
--header 'Authorization: Bearer $OAUTH_TOKEN' \
-d '{
"workspace_name": "my-company-example",
"deployment_name": "my-company-example",
"aws_region": "us-west-2",
"credentials_id": "<aws-credentials-id>",
"storage_configuration_id": "<databricks-storage-config-id>",
"network_id": "<databricks-network-id>",
"managed_services_customer_managed_key_id": "<aws-kms-managed-services-key-id>",
"storage_customer_managed_key_id": "<aws-kms-notebook-workspace-storage-id>",
"private_access_settings_id": "<private-access-settings-id>",
"custom_tags": {
"Organization": "Marketing",
"Env": "Prod"
}
}'
Example response:
{
"workspace_id": 123456789,
"workspace_name": "my-company-example",
"aws_region": "us-west-2",
"creation_time": 1579768294842,
"deployment_name": "my-company-example",
"workspace_status": "PROVISIONING",
"account_id": "<databricks-account-id>",
"credentials_id": "<aws-credentials-id>",
"storage_configuration_id": "<databricks-storage-config-id>",
"workspace_status_message": "Workspace resources are being set up.",
"network_id": "<databricks-network-id>",
"managed_services_customer_managed_key_id": "<aws-kms-managed-services-key-id>",
"storage_customer_managed_key_id": "<aws-kms-notebook-workspace-storage-id>",
"private_access_settings_id": "<private-access-settings-id>",
"pricing_tier": "ENTERPRISE",
"custom_tags": {
"Organization": "Marketing",
"Env": "Prod"
}
}
If you specified a customer-managed VPC and the workspace creation step returns a network-related error, you can call the get network configuration API (endpoint /networks/<network-id>
) to validate the network settings. See Troubleshoot workspace creation errors.
Step 7: Confirm the new workspace
To check workspace status, call the get workspace API (GET /accounts/<account-id>/workspaces/<workspace-id>
).
Use the workspace_id
value from the JSON response returned when you created the workspace.
In the response, possible workspace_status
values are:
NOT_PROVISIONED
: Not yet provisioned.PROVISIONING
: Still provisioning. Wait a few minutes and repeat this API request.RUNNING
: Successful deployment and now running.FAILED
: Failed deployment.BANNED
: Banned.CANCELLING
: In process of cancellation.
See Troubleshoot workspace creation errors for how to handle unsuccessful status values.
For example:
curl -X GET
'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces/<databricks-workspace-id>' \
--header 'Authorization: Bearer $OAUTH_TOKEN'
Response:
{
"workspace_id": 123456789,
"workspace_name": "my-company-example",
"aws_region": "us-west-2",
"creation_time": 1579768294842,
"deployment_name": "my-company-example",
"workspace_status": "RUNNING",
"account_id": "<databricks-account-id>",
"credentials_id": "<aws-credentials-id>",
"storage_configuration_id": "<databricks-storage-config-id>",
"workspace_status_message": "Workspace is running.",
"network_id": "339f16b9-b8a3-4d50-9d1b-7e29e49448c3",
"managed_services_customer_managed_key_id": "<aws-kms-managed-services-key-id>",
"storage_customer_managed_key_id": "<aws-kms-notebook-workspace-storage-id>",
"pricing_tier": "ENTERPRISE"
}
In this example, the workspace status (workspace_status
) is set to RUNNING
, so it was successful. If it is PROVISIONING
, repeat this API request until it succeeds.
The pricing tier defaults to the plan associated with your account. See AWS platform tiers.
Test your new workspace after its status is RUNNING
:
User interface login on the new workspace — Confirm you can log in to the web application at URL
https://<deployment-name>.cloud.databricks.com
. For example, if the deployment name you specified during workspace creation isABCSales
, your workspace URL ishttps://abcsales.cloud.databricks.com
. Log in using your account username.REST API login on the new workspace — Confirm that you can access the REST API. The following example calls the Workspace Users API to get a list of users.
curl -u <user-name> -X GET 'https://<deployment-name>.cloud.databricks.com/api/2.0/scim/v2/Users' \ --header 'Authorization: Bearer $OAUTH_TOKEN'
For more information about using Databricks REST APIs, including other authentication options, see the Workspace API.
Step 8: Post-deployment PrivateLink configuration (optional)
This step is necessary only if you are configuring AWS PrivateLink.
After workspace creation:
If you are implementing a front-end PrivateLink connection, implement relevant DNS configuration changes as described in Step 5: Configure internal DNS to redirect user requests to the web application (front-end).
Optionally create other VPC endpoints, as described in Step 7: Add VPC endpoints for other AWS services.
Step 9: Other optional post-deployment configuration
You might want to consider these optional configuration steps for your new workspace.
Enable IP Access Lists
Configure which IP addresses can connect to the web application, REST APIs, JDBC/ODBC endpoints, and DBConnect. You can specify allow lists and block lists as IP addresses or ranges. See Configure IP access lists for workspaces.
Enable audit log system table
Databricks strongly recommends that you enable the audit log system table to monitor the activities performed and usage incurred by your Databricks users. Your workspace must have Unity Catalog enabled. See Monitor usage with system tables for instructions.
Troubleshoot workspace creation errors
The following sections provide solutions for common workspace creation errors.
The maximum number of addresses has been reached
When Databricks creates a VPC on your behalf, you must have at least one unused Elastic IP. Otherwise, the VPC isn’t created and the following error occurs:
The maximum number of addresses has been reached.
Increase the number of Elastic IPs and try again.
General troubleshooting steps
For all workspace creation errors, try the following troubleshooting steps in the order provided.
Validate network
If the workspace creation or status check steps indicate a network-related error, call the get network configuration API to ensure that the network settings are correct. This API endpoint has the form:
/accounts/<databricks-account-id>/networks/<databricks-network-id>
For example:
curl -X GET
'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/networks/<databricks-network-id>' \
--header 'Authorization: Bearer $OAUTH_TOKEN'
In the response, view the warning_messages
error_messages
fields. If both arrays are empty, there are no warnings or errors.
Otherwise, review the warnings and error JSON objects carefully:
For warnings, the
warning_type
enumeration indicates that the problem was with either a subnet or security group. Thewarning_message
provides additional details. Be aware that if you have a firewall or NAT instance (instead of a NAT gateway), the network validation always issues a warning.For errors, the
error_type
enumeration indicates that the problem was with either credentials, VPC, subnet, security group, or network ACL. Theerror_message
provides additional details.
Fix infrastructure issues
Depending on the errors in the response to the get network configuration API API request, confirm that:
Your security group complies with the customer-managed VPC requirements.
Your cross-account IAM policy includes the required permissions. See Create an IAM role for workspace deployment for the policy to use for your deployment type.
Update the failed workspace
To update the failed workspace, call the update workspace and redeploy API (PATCH /accounts/<account-id>/workspaces/<workspace-id>
).
The update workspace API supports updates of workspace configurations that failed during workspace creation only to change the configurations for credentials, storage, network (for customer-managed VPC), and keys (for encrypting notebooks).
Note
You can use the same API to update a running (successfully deployed) workspace but you only can change the credential and network configurations.
You can pass these workspace configuration fields to change them: credentials_id
, storage_configuration_id
, network_id
, managed_services_customer_managed_key_id
, and storage_customer_managed_key_id
.
If the workspace_status
value returns PROVISIONING
, keep checking for RUNNING
state using the get workspace API.
For example:
curl -X PATCH 'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces/<databricks-workspace-id>' \
--header 'Authorization: Bearer $OAUTH_TOKEN' \
-d '{
"aws_region": "us-west-2",
"credentials_id": "<aws-credentials-id>",
"storage_configuration_id": "<databricks-storage-config-id>",
"network_id": "<databricks-network-id>",
"managed_services_customer_managed_key_id": "<aws-kms-managed-services-key-id>",
"storage_customer_managed_key_id": "<aws-kms-notebook-workspace-storage-id>"
}'
Response:
{
"workspace_id": 123456789,
"workspace_name": "my-company-example",
"aws_region": "us-west-2",
"creation_time": 1579768294842,
"deployment_name": "my-company-example",
"workspace_status": "PROVISIONING",
"account_id": "<databricks-account-id>",
"credentials_id": "<aws-credentials-id>",
"storage_configuration_id": "<databricks-storage-config-id>",
"workspace_status_message": "Workspace resources are being set up.",
"network_id": "<databricks-network-id>",
"managed_services_customer_managed_key_id": "<aws-kms-managed-services-key-id>",
"storage_customer_managed_key_id": "<aws-kms-notebook-workspace-storage-id>",
"pricing_tier": "ENTERPRISE"
}
If the workspace update fails, recreate the network and workspace
If the update workspace API doesn’t work, you must delete and recreate the network (if you provided your own VPC) and the failed workspace in the following order.
Delete the workspace using the delete workspace API (
DELETE /accounts/<account-id>/workspaces/<workspace-id>
).For example:
curl -X DELETE 'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces/<databricks-workspace-id>' \ --header 'Authorization: Bearer $OAUTH_TOKEN'
If you provided your own VPC, delete the Databricks network configuration using the delete network configuration API (
DELETE /accounts/<account-id>/networks/<network-id>
).For example:
curl -X DELETE 'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/networks/<databricks-network-id>' --header 'Authorization: Bearer $OAUTH_TOKEN'
Recreate the network using the correct values for
vpc_id
,subnet_ids
andsecurity_group_ids
.Recreate the workspace using the correct values for
credentials_id
,storage_configuration_id
,network_id
,managed_services_customer_managed_key_id
, andstorage_customer_managed_key_id
.If you get the
workspace_status
valuePROVISIONING
, keep checking forRUNNING
state using the get workspace API.