Create a new workspace using the Account API

You can create workspaces using the Account API if your account is on the E2 version of the platform or on a select custom plan that allows multiple workspaces per account. Contact your Databricks representative to request access.

The Account API lets you programmatically create multiple new Databricks workspaces associated with a single Databricks account. Each workspace you create can have different configuration settings.

The Account API is also required if you want to use the features customer-managed VPCs, secure cluster connectivity, and customer-managed keys for notebooks (Preview). These features require your account to be on the E2 version of the platform.

Note

Many workspace creation steps can be automated using templates. Templates can help you implement fast, consistent, automated workspace deployments. See Using automation templates to create a new workspace using the Account API.

If your account is not on either the E2 version of the platform or on a select custom plan that allows multiple workspaces per account, instead see Set up and deploy your Databricks account.

Requirements

Only account owners of Databricks accounts that are enabled for multiple workspaces can use the API. After your Databricks representative updates your account subscription to support multiple workspaces, you will receive a welcome email.

Before you create new workspaces using the Account API, you must:

  • Review your welcome email for the following information:

    • Account ID, which is used as the external ID for cross-account access and required for many API calls. This article uses the variable <databricks-account-id> to represent this identifier in sample API requests and responses.

      Important

      Protect your account ID like a credential.

    • Account username, which is your email address. This value is case sensitive. Use the same capitalization as when you sent it to your Databricks representative.

    • Account password. Click the link in the email to reset your password. You can also reset it again later.

  • Determine if your workspace will enable the following features, which require that your account be on the E2 version of the platform:

    • Customer-managed VPC — Provide your own Amazon Virtual Public Cloud (VPC).
    • Secure cluster connectivity — Network architecture with no VPC open ports and no Databricks runtime worker public IP addresses. In some APIs, this is referred to as No Public IP or NPIP. Note: Secure cluster connectivity is enabled by default for all workspaces created by Account API after September 1, 2020.
    • Customer-managed keys for notebooks — (Private Preview) Provide KMS keys to encrypt notebooks in the Databricks-managed control plane.
  • Determine the regions to use for your workspace’s data plane (VPC). The control plane region is determined by the data plane region. Workspace data plane VPCs can be in AWS regions us-east-1, us-east-2, us-west-1, us-west-2, eu-west-1, eu-central-1, or ca-central-1. However, you cannot use a VPC in us-west-1 if you want to use customer-managed keys to encrypt notebooks.

How to use the Account API

The Account API is published on the accounts.cloud.databricks.com base endpoint for all AWS regional deployments.

Use the following base URL for API requests: https://accounts.cloud.databricks.com/api/2.0/

This REST API requires HTTP basic authentication, which involves setting the HTTP header Authorization. in this section, username refers to your account email address. The email address is case sensitive, so use the same capitalization as when you sent it to your Databricks representative. There are several ways to provide your credentials to tools such as curl.

  • Pass your username and account password separately in the headers of each request in <username>:<password> syntax. .

    For example:

    curl -X GET -u `<username>:<password>` -H "Content-Type: application/json" \
     'https://accounts.cloud.databricks.com/api/2.0/accounts/<accountId>/<endpoint>'
    
  • Apply base64 encoding to your <username>:<password> string and provide it directly in the HTTP header:

    curl -X GET -H "Content-Type: application/json" \
      -H 'Authorization: Basic <base64-username-pw>'
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<accountId>/<endpoint>'
    
  • Create a .netrc file with machine, login, and password properties:

    machine accounts.cloud.databricks.com
    login <username>
    password <password>
    

    To invoke the .netrc file, use -n in your curl command:

    curl -n -X GET 'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/workspaces'
    

    This article’s examples use this authentication style.

For the complete API reference, see Account API.

Step 1: Configure cross-account authentication

Databricks needs access to a cross-account service IAM role in your AWS account so that Databricks can deploy clusters in the appropriate VPC for the new workspace.

  1. If such a role does not yet exist, see Create a cross-account IAM role for launching multiple workspaces to create an appropriate role and policy for your deployment type. You will need the ARN for your new role (the role_arn) later in this procedure.

    Note

    You can share a cross-account IAM role with multiple workspaces. You are not required to create a new one for each workspace. If you already have one, you can skip this step.

  2. Create a Databricks credentials configuration ID for your AWS role. Call the Create credential configuration API (POST /accounts/<accountId>/credentials). This request establishes cross-account trust and returns a reference ID to use when you create a new workspace.

    Note

    You can share a credentials configuration ID with multiple workspaces. It is not required to create a new one for each workspace. If you already have one, you can skip this step.

    Replace <accountId> with your Databricks account ID. For authentication, see Step 2 earlier on this page. In the request body:

    • Set credentials_name to a name for these credentials. The name must be unique within your account.
    • Set aws_credentials to an object that contains an sts_role property. That object must contain a role_arn property that specifies the AWS role ARN for the role you’ve created.

    The response body will include a credentials_id field, which is the Databricks credentials configuration ID that you need to create the new workspace. Copy this field for later use.

    For example:

     curl -X POST -n \
       'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/credentials' \
       -d '{
       "credentials_name": "databricks-workspace-credentials-v1",
       "aws_credentials": {
         "sts_role": {
           "role_arn": "arn:aws:iam::<aws-account-id>:role/my-company-example-role"
         }
       }
     }'
    

    Example response:

     {
       "credentials_id": "<databricks-credentials-id>",
       "account_id": "<databricks-account-id>",
       "aws_credentials": {
         "sts_role": {
           "role_arn": "arn:aws:iam::<aws-account-id>:role/my-company-example-role",
           "external_id": "<databricks-account-id>"
         }
       },
       "credentials_name": "databricks-workspace-credentials-v1",
       "creation_time": 1579753556257
     }
    

    Copy the credentials_id field from the response for later use.

Step 2: Configure root storage

The root storage S3 bucket in your account is required to store objects like cluster logs, notebook revisions, and job results. You can also use the root storage S3 bucket for storage of non-production data, like data you need for testing.

Note

You can share a root S3 bucket with multiple workspaces in a single account. You do not have to create new ones for each workspace. If you share a root S3 bucket for multiple workspaces in an account, data on the root S3 bucket is partitioned into separate directories by workspace. If you already have a bucket and an associated storage configuration ID generated by the Account API, you can skip this step.

  1. Create the root S3 bucket, using the instructions in Configure AWS storage (Account API).

  2. Create a storage configuration record that represents the root S3 bucket. Specify your root S3 bucket by name by calling the create storage configuration API (POST /accounts/<account-id>/storage-configurations).

    The request returns a storage configuration ID that represents your S3 bucket.

    Pass the following:

    • storage_configuration_name — New unique storage configuration name.
    • root_bucket_info — A JSON object that contains a bucket_name field that contains your S3 bucket name.

    The response body includes a storage_configuration_id property, which is that bucket’s storage configuration ID. Copy that value for later use.

    For example:

    curl -X POST -n \
        'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/storage-configurations' \
      -d '{
        "storage_configuration_name": "databricks-workspace-storageconf-v1",
        "root_bucket_info": {
          "bucket_name": "my-company-example-bucket"
        }
      }'
    

    Response:

    {
      "storage_configuration_id": "<databricks-storage-config-id>",
      "account_id": "<databricks-account-id>",
      "root_bucket_info": {
        "bucket_name": "my-company-example-bucket"
      },
      "storage_configuration_name": "databricks-workspace-storageconf-v1",
      "creation_time": 1579754875555
    }
    

Step 3: Configure customer-managed VPC (optional)

By default, Databricks creates a VPC in your AWS account for each workspace. Databricks uses it for running clusters in the workspace. Optionally, you can use your own VPC for the workspace, using the feature customer-managed VPC. Databricks recommends that you provide your own VPC so that you can configure it according to your organization’s enterprise cloud standards while still conforming to Databricks requirements. You cannot migrate an existing workspace to your own VPC.

  1. Set up your VPC, subnets, and security groups, using the instructions in Customer-managed VPC. Copy the IDs for each of those objects for the next step, in which you register them with Databricks and get a network ID to represent your new network.

    Important

    You can share one customer-managed VPC with multiple workspaces in a single account. You do not have to create a new VPC for each workspace. However, you cannot reuse subnets or security groups with any other resources, including other workspaces or non-Databricks resources. If you plan to share one VPC with multiple workspaces, be sure to size your VPC and subnets accordingly. Because a Databricks network ID encapsulates this information, you cannot reuse a network ID across workspaces.

  2. To register your network configuration with Databricks, call the create network configuration API (POST /accounts/<account-id>/networks).

    Pass the following:

    • network_name — New unique network name.
    • vpc_id — VPC ID.
    • subnet_ids — Subnet IDs, as an array.
    • security_group_ids — Security Group IDs, as an array.

    For example:

    curl -X POST -n \
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/networks' \
      -d '{
      "network_name": "mycompany-vpc-example",
      "vpc_id": "<aws-vpc-id>",
      "subnet_ids": [
        "<aws-subnet-id-1>",
        "<aws-subnet-id-2>"
      ],
      "security_group_ids": [
        "<aws-security-group-id>"
      ]
    }'
    
  3. Copy the network_id from the response body for later use. This is the network ID that represents the network for your new workspace.

    Example Response:

    {
      "network_id": "<databricks-network-id>",
      "account_id": "<databricks-account-id>",
      "vpc_id": "<aws-vpc-id>",
      "subnet_ids": [
        "<aws-subnet-id-1>",
        "<aws-subnet-id-2>"
      ],
      "security_group_ids": [
        "<aws-security-group-id>"
      ],
      "vpc_status": "UNATTACHED",
      "network_name": "mycompany-vpc-example",
      "creation_time": 1579767389544
    }
    

Step 4: Configure customer-managed key for notebooks (optional)

By default, notebooks and secrets are encrypted in the control plane using a key unique to the control plane but not the workspace. Optionally, you can specify your own encryption key to encrypt notebooks, a feature known as customer-managed keys for notebooks.

Note

Workspace data plane VPCs can be in AWS regions us-east-1, us-east-2, us-west-1, us-west-2, eu-west-1, eu-central-1, or ca-central-1. However, you cannot use a VPC in us-west-1 if you want to use customer-managed keys to encrypt notebooks.

Preview

Customer-managed keys for notebooks in the control plane is in Private Preview.

If your master account is enabled to support this feature and you want to use it with this new workspace, you must set up your keys now. You cannot add these keys after you have created the workspace.

Important

You can share a customer-managed key across workspaces. This includes both the key ARN and the Databricks customer-managed key ID that is generated when you register your key with Databricks.

  1. Follow the instructions in Customer managed keys for notebooks to set up a KMS key for use by Databricks. The key_arn, key_alias and key_region fields are necessary.

  2. To update the KMS key information with Databricks, call the create customer-managed key API (POST /accounts/<account-id>/customer-managed-keys).

    Pass the following:

    • key_arn — (Required) AWS KMS key ARN. Note that the AWS region is inferred from the ARN.
    • key_alias — (Optional) AWS KMS key alias.
    curl -X POST -n \
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/customer-managed-keys' \
      -d '{
      "aws_key_info": {
        "key_arn": "arn:aws:kms:us-west-2:<aws-account-id>:key/<key-id>",
        "key_alias": "my-example-key"
      }
    }'
    
  3. Copy the customer_managed_key_id from the response body for use in a later step.

    Example Response

    {
        "customer_managed_key_id": "<aws-kms-key-id>",
        "creation_time": 1586447506984,
        "account_id": "<databricks-account-id>",
        "aws_key_info": {
            "key_arn": "arn:aws:kms:us-west-2:<aws-account-id>:key/<key-id>",
            "key_alias": "my-example-key",
            "key_region": "us-west-2"
        }
    }
    

Step 5: Create the workspace

To create the new workspace, call the create workspace API (POST /accounts/<account-id>/workspaces).

You need the following values that you copied in the previous steps:

  • workspace_name — Human-readable name for your workspace.
  • deployment_name — (Recommended but optional) Unique deployment name for your workspace.
  • credentials_id — Your credential ID, which represents your cross-account role credentials
  • storage_configuration_id — Your storage configuration ID, which represents your root S3 bucket.
  • network_id — (Optional, only used for customer-managed VPC) Your network ID.
  • customer_managed_key_id — (Optional, only used for customer-managed key) Your key ID.

Choose your deployment_name value carefully. The deployment name defines part of the subdomain for the workspace, The workspace URL for web application and REST APIs is <deployment-name>.cloud.databricks.com. For example, if the deployment name is ABCSales, your workspace URL will be https://abcsales.cloud.databricks.com. This property supports characters a-z and 0-9. Hyphens are also allowed but not as the first or last character.

Accounts can have a deployment name prefix. Contact your Databricks representative to add an account deployment name prefix to your account. If your account has a non-empty deployment name prefix at workspace creation time, the workspace deployment name is updated so that it begins with the account prefix and a hyphen. For example, if your account’s deployment prefix is acme and the workspace deployment name is workspace-1, the deployment_name field becomes acme-workspace-1. In this example, the workspace URL is acme-workspace-1.cloud.databricks.com.

After this modification with the account prefix, the new value is what is returned in JSON responses for this workspace’s deployment_name field.

If your account has a non-empty deployment name prefix and you set deployment_name to the reserved keyword EMPTY, deployment_name is the account prefix only. For example, if your account’s deployment prefix is acme and the workspace deployment name is EMPTY, deployment_name becomes acme only, and the workspace URL is acme.cloud.databricks.com. If your account does not yet have a deployment name prefix, the special deployment name value EMPTY is invalid.

Note

Secure cluster connectivity is enabled by default for workspaces created by Account API after September 1, 2020. This network architecture has no public IP address or open ports on the data plane VPC.

The JSON response includes the property workspace_id. Copy this value for later use.

For example:

curl -X POST -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces' \
  -d '{
  "workspace_name": "my-company-example",
  "deployment_name": "my-company-example",
  "aws_region": "us-west-2",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "network_id": "<databricks-network-id>",
  "customer_managed_key_id": "<aws-kms-key-id>"
}'

Example Response:

{
  "workspace_id": 123456789,
  "workspace_name": "my-company-example",
  "aws_region": "us-west-2",
  "creation_time": 1579768294842,
  "deployment_name": "my-company-example",
  "workspace_status": "PROVISIONING",
  "account_id": "<databricks-account-id>",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "workspace_status_message": "Workspace resources are being set up.",
  "network_id": "<databricks-network-id>",
  "customer_managed_key_id": "<aws-kms-key-id>",
  "pricing_tier": "ENTERPRISE"
}

The pricing tier defaults to the plan associated with your account. See AWS Pricing.

If you specified a customer-managed VPC and the workspace creation step returns a network-related error, you can call the Get Network API (endpoint /networks/<network-id>) to validate the network settings. See Troubleshoot a workspace that failed to deploy.

Step 6: Confirm the new workspace

To check workspace status, call the get workspace API (GET /accounts/<account-id>/workspaces/<workspace-id>).

Use the workspace_id value from the JSON response returned when you created the workspace.

In the response, possible workspace_status values are:

  • NOT_PROVISIONED — Not yet provisioned.
  • PROVISIONING — Still provisioning. Wait a few minutes and repeat this API request.
  • RUNNING — Successful deployment and now running.
  • FAILED — Failed deployment.
  • BANNED — Banned.
  • CANCELLING — In process of cancellation.

See Troubleshoot a workspace that failed to deploy for how to handle unsuccessful status values.

For example:

curl -X GET -n \
'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces/<databricks-workspace-id>'

Response:

{
  "workspace_id": 123456789,
  "workspace_name": "my-company-example",
  "aws_region": "us-west-2",
  "creation_time": 1579768294842,
  "deployment_name": "my-company-example",
  "workspace_status": "RUNNING",
  "account_id": "<databricks-account-id>",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "workspace_status_message": "Workspace is running.",
  "network_id": "339f16b9-b8a3-4d50-9d1b-7e29e49448c3",
  "customer_managed_key_id": "<aws-kms-key-id>",
  "pricing_tier": "ENTERPRISE"
}

In this example, the workspace status (workspace_status) is set to RUNNING, so it was successful. If it is PROVISIONING, repeat this API request until it succeeds.

The pricing tier defaults to the plan associated with your account. See AWS Pricing.

Test your new workspace after its status is RUNNING:

  • User interface login on the new workspace — Confirm you can log in to the web application at URL https://<deployment-name>.cloud.databricks.com. For example, if the deployment name you specified during workspace creation is ABCSales, your workspace URL is https://abcsales.cloud.databricks.com. Use your account username and password.

  • REST API login on the new workspace — Confirm that you can access the REST API. The following example gets a list of users using the SCIM API. The curl tool will prompt you for a password.

    curl  -u <user-name> -X GET 'https://oregon.cloud.databricks.com/api/2.0/preview/scim/v2/Users'
    

    For more information about using Databricks REST APIs, including other authentication options, see REST API 2.0

Step 7: Post-deployment configuration (optional)

There are other configuration steps that you might want to consider for your new workspace:

Enable IP Access Lists

Configure the IP addresses that can connect to the web application, REST APIs, JDBC/ODBC endpoints, and DBConnect. You can specify whitelists and blacklists as IP addresses or ranges. See IP access lists.

Enable audit logs

Databricks strongly recommends that you configure audit logging to monitor the activities performed and usage incurred by your Databricks users. You must contact your Databricks representative to enable audit logs for your new workspace. See Configure audit logging for instructions.

Troubleshoot a workspace that failed to deploy

For all workspace creation errors, try the following troubleshooting steps in the order provided.

Validate network

dd`/accounts/<$accountId>/networks/<$networkId>`

For example:

  curl -X GET -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/networks/<databricks-network-id>'

In the response, view the warning_messages error_messages fields. If both arrays are empty, there are no warnings or errors.

Otherwise, review the warnings and error JSON objects carefully:

  • For warnings, the warning_type enumeration indicates that the problem was with either a subnet or security group. The warning_message provides additional details. Be aware that if you have a firewall or NAT instance (instead of a NAT gateway), the network validation always issues a warning.
  • For errors, the error_type enumeration indicates that the problem was with either credentials, VPC, subnet, security group, or network ACL. The error_message provides additional details.

Fix infrastructure issues

Depending on the errors in the response to the Get Network API request, confirm that:

  • Your security group complies with the customer-managed VPC requirements.
  • Your cross-account IAM policy includes the required permissions. See Create a cross-account IAM role for launching multiple workspaces for the policy to use for your deployment type.
  • Your Databricks account was enabled by Databricks for multiple workspaces and for any additional features you are using (customer-managed VPC, customer-managed notebooks, secure cluster connectivity). Contact your Databricks representative to confirm.

Update the failed workspace

To update the failed workspace, call the update workspace and redeploy API (PATCH /accounts/<account-id>/workspaces/<workspace-id>).

Important

The update workspace API only supports updates of workspaces and deployments that failed during workspace creation. You cannot use it to update a successfully deployed workspace.

Use credentials_id, storage_configuration_id, network_id and optional, and customer_managed_key_id from the initial request. The field deployment_name is reused from the initial request implicitly, although you can specify the latter explicitly without harm.

If the workspace_status value returns PROVISIONING, keep checking for RUNNING state using the get workspace API.

For example:

  curl -X PATCH -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces/<databricks-workspace-id>' \
  -d '{
  "aws_region": "us-west-2",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "network_id": "<databricks-network-id>",
  "customer_managed_key_id": "<aws-kms-key-id>"
}'

Response:

{
  "workspace_id": 123456789,
  "workspace_name": "my-company-example",
  "aws_region": "us-west-2",
  "creation_time": 1579768294842,
  "deployment_name": "my-company-example",
  "workspace_status": "PROVISIONING",
  "account_id": "<databricks-account-id>",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "workspace_status_message": "Workspace resources are being set up.",
  "network_id": "<databricks-network-id>",
  "customer_managed_key_id": "<aws-kms-key-id>",
  "pricing_tier": "ENTERPRISE"
}

If the workspace update fails, recreate the network and workspace

If the update workspace API doesn’t work, you must delete and recreate the network (if you provided your own VPC) and the failed workspace in the following order.

  1. Delete the workspace using the delete workspace API (DELETE /accounts/<account-id>/workspaces/<workspace-id>).

    For example:

    curl -X DELETE -n \
    'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/workspaces/<databricks-workspace-id>' \
    
  2. If you provided your own VPC, delete the Databricks network configuration using the delete network configuration API (DELETE /accounts/<account-id>/networks/<network-id>).

    For example:

    curl -X DELETE -n \
    'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/networks/<databricks-network-id>'
    
  3. Recreate the network using the correct values for vpc_id, subnet_ids and security_group_ids.

  4. Recreate the workspace using the correct values for credentials_id, storage_configuration_id, network_id, and the optional customer_managed_key_id.

    If you get the workspace_status value PROVISIONING, keep checking for RUNNING state using the get workspace API.