Create a new workspace using the Account API

You can create workspaces using the Account API if your account is on the E2 version of the platform or on a select custom plan that allows multiple workspaces per account. All new Databricks accounts and most existing accounts are now E2. If you are unsure which account type you have, contact your Databricks representative. Contact your Databricks representative to request access.

The Account API lets you programmatically create multiple new Databricks workspaces associated with a single Databricks account. Each workspace you create can have different configuration settings.

You must use the Account API to create workspaces if you want to use customer-managed keys for managed services or AWS PrivateLink (both Public Preview).

You can also perform many of the tasks described in this article using the account console for E2 accounts.

Note

Many workspace creation steps can be automated using templates. Templates can help you implement fast, consistent, automated workspace deployments. See Use automation templates to create a new workspace using the Account API.

Requirements

Only account owners and account admins on Databricks accounts that are enabled for multiple workspaces can use the API. After your Databricks representative updates your account subscription to support multiple workspaces, you will receive a welcome email.

Before you create new workspaces using the Account API, you must:

  • Review your welcome email for the following information:

    • Account ID, which is used as the external ID for cross-account access and is required for many API calls. This article uses the variable <databricks-account-id> to represent this identifier in sample API requests and responses.

      Important

      Protect your account ID like a credential.

    • Account username, which is your email address. This value is case sensitive. Use the same capitalization as when you sent it to your Databricks representative.

    • Account password. Click the link in the email to reset your password. You can also reset it again later.

  • Determine if your workspace will enable the following features, which require that your account is on the E2 version of the platform:

  • Determine the regions to use for your workspace’s data plane (VPC). The control plane region is determined by the data plane region. Workspace data plane VPCs can be in AWS regions ap-northeast-1, ap-south-1, ap-southeast-2, ca-central-1, eu-west-1, eu-west-2, eu-central-1, us-east-1, us-east-2, us-west-1, and us-west-2. However, you cannot use a VPC in us-west-1 if you want to use customer-managed keys to encrypt managed services or workspace storage.

How to use the Account API

The Account API is published on the accounts.cloud.databricks.com base endpoint for all AWS regional deployments.

Use the following base URL for API requests: https://accounts.cloud.databricks.com/api/2.0/.

This REST API requires HTTP basic authentication, which involves setting the HTTP header Authorization. in this section, username refers to your account email address. The email address is case sensitive, so use the same capitalization as when you sent it to your Databricks representative. There are several ways to provide your credentials to tools such as curl.

  • Pass your username and account password separately in the headers of each request in <username>:<password> syntax.

    For example:

    curl -X GET -u `<username>:<password>` -H "Content-Type: application/json" \
     'https://accounts.cloud.databricks.com/api/2.0/accounts/<accountId>/<endpoint>'
    
  • Apply base64 encoding to your <username>:<password> string and provide it directly in the HTTP header:

    curl -X GET -H "Content-Type: application/json" \
      -H 'Authorization: Basic <base64-username-pw>'
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<accountId>/<endpoint>'
    
  • Create a .netrc file with machine, login, and password properties:

    machine accounts.cloud.databricks.com
    login <username>
    password <password>
    

    To invoke the .netrc file, use -n in your curl command:

    curl -n -X GET 'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/workspaces'
    

    This article’s examples use this authentication style.

For the complete API reference, see Account API.

Step 1: Configure cross-account authentication

Databricks needs access to a cross-account service IAM role in your AWS account so that Databricks can deploy clusters in the appropriate VPC for the new workspace.

  1. If such a role does not yet exist, see Create a cross-account IAM role to create an appropriate role and policy for your deployment type. You will need the ARN for your new role (the role_arn) later in this procedure.

    Note

    You can share a cross-account IAM role with multiple workspaces. You are not required to create a new one for each workspace. If you already have one, you can skip this step.

  2. Create a Databricks credentials configuration ID for your AWS role. Call the Create credential configuration API (POST /accounts/<accountId>/credentials). This request establishes cross-account trust and returns a reference ID to use when you create a new workspace.

    Note

    You can share a credentials configuration ID with multiple workspaces. It is not required to create a new one for each workspace. If you already have one, you can skip this step.

    Replace <accountId> with your Databricks account ID. For authentication, see Step 2 earlier on this page. In the request body:

    • Set credentials_name to a name for these credentials. The name must be unique within your account.
    • Set aws_credentials to an object that contains an sts_role property. That object must contain a role_arn property that specifies the AWS role ARN for the role you’ve created.

    The response body will include a credentials_id field, which is the Databricks credentials configuration ID that you need to create the new workspace. Copy this field for later use.

    For example:

     curl -X POST -n \
       'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/credentials' \
       -d '{
       "credentials_name": "databricks-workspace-credentials-v1",
       "aws_credentials": {
         "sts_role": {
           "role_arn": "arn:aws:iam::<aws-account-id>:role/my-company-example-role"
         }
       }
     }'
    

    Example response:

     {
       "credentials_id": "<databricks-credentials-id>",
       "account_id": "<databricks-account-id>",
       "aws_credentials": {
         "sts_role": {
           "role_arn": "arn:aws:iam::<aws-account-id>:role/my-company-example-role",
           "external_id": "<databricks-account-id>"
         }
       },
       "credentials_name": "databricks-workspace-credentials-v1",
       "creation_time": 1579753556257
     }
    

    Copy the credentials_id field from the response for later use.

Step 2: Configure root storage

The root storage S3 bucket in your account stores objects like cluster logs, notebook revisions, and job results. You can also use the root storage S3 bucket to store non-production data, like data you need for testing.

Note

You can share a root S3 bucket with multiple workspaces in a single account. You do not have to create new ones for each workspace. If you share a root S3 bucket for multiple workspaces in an account, data on the root S3 bucket is partitioned into separate directories by workspace. If you already have a bucket and an associated storage configuration ID generated by the Account API, you can skip this step.

  1. Create the root S3 bucket using the instructions in Configure AWS storage.

  2. Create a storage configuration record that represents the root S3 bucket. Specify your root S3 bucket by name by calling the create storage configuration API (POST /accounts/<account-id>/storage-configurations).

    The request returns a storage configuration ID that represents your S3 bucket.

    Pass the following:

    • storage_configuration_name: New unique storage configuration name.
    • root_bucket_info: A JSON object that contains a bucket_name field that contains your S3 bucket name.

    The response body includes a storage_configuration_id property, which is that bucket’s storage configuration ID. Copy that value for later use.

    For example:

    curl -X POST -n \
        'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/storage-configurations' \
      -d '{
        "storage_configuration_name": "databricks-workspace-storageconf-v1",
        "root_bucket_info": {
          "bucket_name": "my-company-example-bucket"
        }
      }'
    

    Response:

    {
      "storage_configuration_id": "<databricks-storage-config-id>",
      "account_id": "<databricks-account-id>",
      "root_bucket_info": {
        "bucket_name": "my-company-example-bucket"
      },
      "storage_configuration_name": "databricks-workspace-storageconf-v1",
      "creation_time": 1579754875555
    }
    

Step 5: Configure customer-managed keys (optional)

Preview

This feature is in Public Preview.

Important

  • This feature requires that your account is on the E2 version of the Databricks platform and on the Enterprise pricing tier.
  • Workspace data plane VPCs can be in AWS regions ap-northeast-1, ap-south-1, ap-southeast-2, ca-central-1, eu-west-1, eu-west-2, eu-central-1, us-east-1, us-east-2, us-west-1, and us-west-2. However, you cannot use a VPC in us-west-1 if you want to use customer-managed keys to encrypt managed services or workspace storage.

There are two use cases for customer-managed encryption keys:

You can choose to configure neither, one, or both of these. If you choose to implement encryption for both uses cases, you can optionally share a key and optionally even the same configuration object for these uses cases.

There are important differences between these two use cases in when you can add the keys:

  • For a customer-managed key for managed services, you must configure it during workspace creation.
  • For a customer-managed key for storage, you can configure it during workspace creation but you can also add the key to a running workspace.

You can share a customer-managed key or its key configuration object across workspaces. When creating a new workspace, a key configuration can represent both encryption use cases by setting its use_cases field to include both enumeration values.

Note

To add a workspace storage key to an existing workspace that already uses notebook encryption you must create a new key configuration object for workspace storage. See Customer-managed keys for workspace storage.

To implement one encryption use case or both encryption use cases with the same key, perform the following procedure exactly once. To add encryption for both encryption use cases with different keys, perform the procedure two times, once for each use case.

  1. Create the AWS KMS key. Follow the instructions in either of the following sections, which differ only in the human-readable description field (sid) in the policy to identify the use case. Create the key for managed services or workspace storage. To share the key and configuration for both use cases, update the sid field accordingly.

  2. To register your KMS key with Databricks, call the create customer-managed key configuration API (POST /accounts/<account-id>/customer-managed-keys).

    Pass the following parameters:

    • use_cases — An array that specifies the uses cases for which to use the key, specify one or both of the following:
    • aws_key_info: A JSON object with the following properties:
      • key_arn: AWS KMS key ARN. Note that Databricks infers the AWS region from the key ARN.
      • key_alias: (Optional) AWS KMS key alias.
      • reuse_key_for_cluster_volumes: (Optional) Used only if the use_cases array contains STORAGE, this specifies whether to also use the key to encrypt cluster EBS volumes. The default value is true, which means Databricks also uses the key for cluster volumes. If you set this to false, Databricks does not encrypt the EBS volumes with your specified key. In that case, your Databricks EBS volumes are encrypted either with default AWS SSE encryption or if you enabled AWS account-level EBS encryption by default, AWS enforces account-level EBS encryption using a separate key that you provided to them.

    Example request:

    curl -X POST -n \
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/customer-managed-keys' \
      -d '{
      "use_cases": ["MANAGED_SERVICES", "STORAGE"],
      "aws_key_info": {
        "key_arn": "arn:aws:kms:us-west-2:<aws-account-id>:key/<key-id>",
        "key_alias": "my-example-key",
        "reuse_key_for_cluster_volumes": true
      }
    }'
    

    Example response:

    {
      "use_cases": ["MANAGED_SERVICES", "STORAGE"],
      "customer_managed_key_id": "<aws-kms-key-id>",
      "creation_time": 1586447506984,
      "account_id": "<databricks-account-id>",
      "aws_key_info": {
          "key_arn": "arn:aws:kms:us-west-2:<aws-account-id>:key/<key-id>",
          "key_alias": "my-example-key",
          "reuse_key_for_cluster_volumes": true,
          "key_region": "us-west-2"
      }
    }
    
  3. From the response JSON, copy the customer_managed_key_id. You use that ID in the next step to set your workspace configuration object’s property managed_services_customer_managed_key_id, storage_customer_managed_key_id, or both, depending on which encryption use cases this object represents.

Step 6: Create the workspace

To create the new workspace, call the create workspace API (POST /accounts/<account-id>/workspaces).

Note

Secure cluster connectivity is enabled by default for all workspaces created using the Account API after September 1, 2020. The pricing tier defaults to the plan associated with your account. See AWS Pricing.

Pass the following parameters, which are values that you copied in previous steps:

  • aws_region: The AWS region of the workspace’s data plane.
  • workspace_name: Human-readable name for your workspace.
  • deployment_name: (Recommended but optional) Unique deployment name for your workspace. For details, see the discussion after this list.
  • credentials_id: Your credential ID, which represents your cross-account role credentials. This is the ID from the credentials configuration object.
  • storage_configuration_id: Your storage configuration ID, which represents your root S3 bucket. This is the ID from the storage configuration object.
  • network_id: (Optional) only used for customer-managed VPC. This is the ID from the network configuration object.
  • managed_services_customer_managed_key_id: (Optional) Used only to encrypt managed services such as notebook and secret data in the control plane, which is in Public Preview. This is your key configuration ID for workspace storage, which is the customer_managed_key_id field from a key configuration object. If you want to support this encryption use case, you must configure it at workspace creation time.
  • storage_customer_managed_key_id: (Optional) Used only to encrypt workspace storage, which is in Public Preview. This is your key configuration ID for workspace storage, which is the customer_managed_key_id field from the key configuration object. If you want to support this encryption use case, you can configure it at workspace creation time, but you can also add it later to a running workspace.
  • private_access_settings_id: (Optional) Used only for AWS PrivateLink, which is in Public Preview. This is the ID of the private access settings object that you created for this workspace. See Create a private access settings configuration using Account API in the PrivateLink article. This is a required field for PrivateLink access for all connection types (front-end, back-end, or both).

Notes about deployment name:

  • Choose your deployment_name value carefully. The deployment name defines part of the subdomain for the workspace. The workspace URL for web application and REST APIs is <deployment-name>.cloud.databricks.com. For example, if the deployment name is ABCSales, your workspace URL will be https://abcsales.cloud.databricks.com. This property supports characters a-z and 0-9. Hyphens are also allowed but not as the first or last character.
  • Accounts can have a deployment name prefix. Contact your Databricks representative to add an account deployment name prefix to your account. If your account has a non-empty deployment name prefix at workspace creation time, the workspace deployment name is updated so that it begins with the account prefix and a hyphen. For example, if your account’s deployment prefix is acme and the workspace deployment name is workspace-1, the deployment_name field becomes acme-workspace-1. In this example, the workspace URL is acme-workspace-1.cloud.databricks.com.
  • After this modification with the account prefix, the new value is what is returned in JSON responses for this workspace’s deployment_name field.
  • If your account has a non-empty deployment name prefix and you set deployment_name to the reserved keyword EMPTY, deployment_name is the account prefix only. For example, if your account’s deployment prefix is acme and the workspace deployment name is EMPTY, deployment_name becomes acme only, and the workspace URL is acme.cloud.databricks.com. If your account does not yet have a deployment name prefix, the special deployment name value EMPTY is invalid.

The JSON response includes the property workspace_id. Copy this value for later use.

For example:

curl -X POST -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces' \
  -d '{
  "workspace_name": "my-company-example",
  "deployment_name": "my-company-example",
  "aws_region": "us-west-2",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "network_id": "<databricks-network-id>",
  "managed_services_customer_managed_key_id": "<aws-kms-managed-services-key-id>",
  "storage_customer_managed_key_id": "<aws-kms-notebook-workspace-storage-id>",
  "private_access_settings_id": "<private-access-settings-id>"
}'

Example response:

{
  "workspace_id": 123456789,
  "workspace_name": "my-company-example",
  "aws_region": "us-west-2",
  "creation_time": 1579768294842,
  "deployment_name": "my-company-example",
  "workspace_status": "PROVISIONING",
  "account_id": "<databricks-account-id>",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "workspace_status_message": "Workspace resources are being set up.",
  "network_id": "<databricks-network-id>",
  "managed_services_customer_managed_key_id": "<aws-kms-managed-services-key-id>",
  "storage_customer_managed_key_id": "<aws-kms-notebook-workspace-storage-id>",
  "private_access_settings_id": "<private-access-settings-id>",
  "pricing_tier": "ENTERPRISE"
}

If you specified a customer-managed VPC and the workspace creation step returns a network-related error, you can call the get network configuration API (endpoint /networks/<network-id>) to validate the network settings. See Troubleshoot a workspace that failed to deploy.

Step 7: Confirm the new workspace

To check workspace status, call the get workspace API (GET /accounts/<account-id>/workspaces/<workspace-id>).

Use the workspace_id value from the JSON response returned when you created the workspace.

In the response, possible workspace_status values are:

  • NOT_PROVISIONED:  Not yet provisioned.
  • PROVISIONING: Still provisioning. Wait a few minutes and repeat this API request.
  • RUNNING: Successful deployment and now running.
  • FAILED: Failed deployment.
  • BANNED: Banned.
  • CANCELLING: In process of cancellation.

See Troubleshoot a workspace that failed to deploy for how to handle unsuccessful status values.

For example:

curl -X GET -n \
'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces/<databricks-workspace-id>'

Response:

{
  "workspace_id": 123456789,
  "workspace_name": "my-company-example",
  "aws_region": "us-west-2",
  "creation_time": 1579768294842,
  "deployment_name": "my-company-example",
  "workspace_status": "RUNNING",
  "account_id": "<databricks-account-id>",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "workspace_status_message": "Workspace is running.",
  "network_id": "339f16b9-b8a3-4d50-9d1b-7e29e49448c3",
  "managed_services_customer_managed_key_id": "<aws-kms-managed-services-key-id>",
  "storage_customer_managed_key_id": "<aws-kms-notebook-workspace-storage-id>",
  "pricing_tier": "ENTERPRISE"
}

In this example, the workspace status (workspace_status) is set to RUNNING, so it was successful. If it is PROVISIONING, repeat this API request until it succeeds.

The pricing tier defaults to the plan associated with your account. See AWS Pricing.

Test your new workspace after its status is RUNNING:

  • User interface login on the new workspace — Confirm you can log in to the web application at URL https://<deployment-name>.cloud.databricks.com. For example, if the deployment name you specified during workspace creation is ABCSales, your workspace URL is https://abcsales.cloud.databricks.com. Use your account username and password.

  • REST API login on the new workspace — Confirm that you can access the REST API. The following example gets a list of users using the SCIM API. The curl tool will prompt you for a password.

    curl  -u <user-name> -X GET 'https://oregon.cloud.databricks.com/api/2.0/preview/scim/v2/Users'
    

    For more information about using Databricks REST APIs, including other authentication options, see REST API 2.0

Step 8: Post-deployment PrivateLink configuration (optional)

Preview

This step is necessary only if you are configuring AWS PrivateLink, which is in Public Preview.

After workspace creation:

  1. If you are implementing a front-end PrivateLink connection, implement relevant DNS configuration changes as described in Step 8: Configure the internal DNS (required only for front-end connections).
  2. Optionally create other VPC endpoints, as described in Step 9: Add VPC endpoints for other AWS services (recommended but optional).

Step 9: Other optional post-deployment configuration

You might want to consider these optional configuration steps for your new workspace.

Enable IP Access Lists

Configure which IP addresses can connect to the web application, REST APIs, JDBC/ODBC endpoints, and DBConnect. You can specify allow lists and block lists as IP addresses or ranges. See IP access lists.

Enable audit logs

Databricks strongly recommends that you configure audit logging to monitor the activities performed and usage incurred by your Databricks users. You must contact your Databricks representative to enable audit logs for your new workspace. See Configure audit logging for instructions.

Troubleshoot a workspace that failed to deploy

The maximum number of addresses has been reached

When Databricks creates a VPC on your behalf, you must have at least one unused Elastic IP. Otherwise, the VPC isn’t created and the following error occurs:

The maximum number of addresses has been reached.

Increase the number of Elastic IPs and try again.

General troubleshooting steps

For all workspace creation errors, try the following troubleshooting steps in the order provided.

Validate network

If the workspace creation or status check steps indicate a network-related error, call the get network configuration API to ensure that the network settings are correct. This API endpoint has the form:

/accounts/<databricks-account-id>/networks/<databricks-network-id>

For example:

  curl -X GET -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/networks/<databricks-network-id>'

In the response, view the warning_messages error_messages fields. If both arrays are empty, there are no warnings or errors.

Otherwise, review the warnings and error JSON objects carefully:

  • For warnings, the warning_type enumeration indicates that the problem was with either a subnet or security group. The warning_message provides additional details. Be aware that if you have a firewall or NAT instance (instead of a NAT gateway), the network validation always issues a warning.
  • For errors, the error_type enumeration indicates that the problem was with either credentials, VPC, subnet, security group, or network ACL. The error_message provides additional details.

Fix infrastructure issues

Depending on the errors in the response to the get network configuration API API request, confirm that:

  • Your security group complies with the customer-managed VPC requirements.
  • Your cross-account IAM policy includes the required permissions. See Create a cross-account IAM role for the policy to use for your deployment type.
  • Your Databricks account was enabled by Databricks for multiple workspaces and for any additional features you are using (customer-managed VPC, customer-managed notebooks, secure cluster connectivity). Contact your Databricks representative to confirm.

Update the failed workspace

To update the failed workspace, call the update workspace and redeploy API (PATCH /accounts/<account-id>/workspaces/<workspace-id>).

The update workspace API supports updates of workspace configurations that failed during workspace creation only to change the region or the configurations for credentials, storage, network (for customer-managed VPC), and keys (for encrypting notebooks).

Note

You can use the same API to update a running (successfully deployed) workspace but you only can change the credential and network configurations.

You can pass these workspace configuration fields to change them: aws_region, credentials_id, storage_configuration_id, network_id, managed_services_customer_managed_key_id, and storage_customer_managed_key_id.

If the workspace_status value returns PROVISIONING, keep checking for RUNNING state using the get workspace API.

For example:

  curl -X PATCH -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces/<databricks-workspace-id>' \
  -d '{
  "aws_region": "us-west-2",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "network_id": "<databricks-network-id>",
  "managed_services_customer_managed_key_id": "<aws-kms-managed-services-key-id>",
  "storage_customer_managed_key_id": "<aws-kms-notebook-workspace-storage-id>"
}'

Response:

{
  "workspace_id": 123456789,
  "workspace_name": "my-company-example",
  "aws_region": "us-west-2",
  "creation_time": 1579768294842,
  "deployment_name": "my-company-example",
  "workspace_status": "PROVISIONING",
  "account_id": "<databricks-account-id>",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "workspace_status_message": "Workspace resources are being set up.",
  "network_id": "<databricks-network-id>",
  "managed_services_customer_managed_key_id": "<aws-kms-managed-services-key-id>",
  "storage_customer_managed_key_id": "<aws-kms-notebook-workspace-storage-id>",
  "pricing_tier": "ENTERPRISE"
}

If the workspace update fails, recreate the network and workspace

If the update workspace API doesn’t work, you must delete and recreate the network (if you provided your own VPC) and the failed workspace in the following order.

  1. Delete the workspace using the delete workspace API (DELETE /accounts/<account-id>/workspaces/<workspace-id>).

    For example:

    curl -X DELETE -n \
    'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces/<databricks-workspace-id>' \
    
  2. If you provided your own VPC, delete the Databricks network configuration using the delete network configuration API (DELETE /accounts/<account-id>/networks/<network-id>).

    For example:

    curl -X DELETE -n \
    'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/networks/<databricks-network-id>'
    
  3. Recreate the network using the correct values for vpc_id, subnet_ids and security_group_ids.

  4. Recreate the workspace using the correct values for credentials_id, storage_configuration_id, network_id, managed_services_customer_managed_key_id, and storage_customer_managed_key_id.

    If you get the workspace_status value PROVISIONING, keep checking for RUNNING state using the get workspace API.