Create a new workspace using the Multi-workspace API

Preview

The Multi-workspace API is in Public Preview. Contact your Databricks representative to request access.

The Multi-workspace (MWS) API lets you programmatically create multiple new Databricks workspaces associated with a single Databricks account. Each workspace you create can have different configuration settings.

The Multi-workspace API is also required if you want to use the preview features customer-managed VPCs, secure cluster connectivity, and customer-managed keys for notebooks. Contact your Databricks representative to determine availability for your subscription and deployment type.

If you need only one workspace for your account or if you do not have a multi-workspace master account ID, instead see Set up and deploy your Databricks account.

Requirements

Only account owners of Databricks accounts that are enabled for multiple workspaces can use the API. After your Databricks representative updates your account subscription to support multiple workspaces, you will receive a welcome email.

Before you create new workspaces using the MWS API, you must:

  • Review your Multi-workspace API welcome email for the following information:

    • Multi-workspace master account ID, which is used as the external ID for cross-account access and required for many API calls. This article uses the variable <databricks-mws-master-account-id> to represent this identifier in sample API requests and responses.

      Important

      Protect your multi-workspace master account ID like a credential.

    • Multi-workspace master account username, which is your email address. This value is case sensitive. Use the same capitalization as when you sent it to your Databricks representative.

    • Multi-workspace master account password. Click the link in the email to reset your password. You can also reset it again later.

  • Determine if your workspace will enable the following features, which are Preview features and may require enablement available only for some pricing plans. If you have questions about availability, contact your Databricks representative:

    • Customer-managed VPC — (Public Preview) Provide your own Amazon Virtual Public Cloud (VPC).
    • Secure cluster connectivity — (Public Preview) Network architecture with no VPC open ports and no Databricks runtime worker public IP addresses. In some APIs, this is referred to as No Public IP or NPIP.
    • Customer-managed keys for notebooks — (Private Preview) Provide KMS keys to encrypt notebooks in the Databricks-managed control plane.
  • Determine the regions to use for your workspace’s data plane (VPC). The control plane region is determined by the data plane region:

    • Workspace data plane VPCs can be in AWS regions us-east-1, us-west-1, or us-west-2.
    • A VPC in us-west-2 or us-east-1 uses the control plane in the same region.
    • A VPC in us-west-1 uses the us-west-2 control plane.

    Important

    To use customer-managed keys to encrypt notebooks, you must deploy your workspace with the data plane in us-east-1 or us-west-2.

How to use the Multi-workspace API

The Multi-workspace API is published on the accounts.cloud.databricks.com base endpoint for all AWS regional deployments.

Use the following base URL for API requests: https://accounts.cloud.databricks.com/api/2.0/

This REST API requires HTTP basic authentication, which involves setting the HTTP header Authorization. in this section, username refers to your multi-workspace master account email address. The email address is case sensitive, so use the same capitalization as when you sent it to your Databricks representative. There are several ways to provide your credentials to tools such as curl.

  • Pass your username and account password separately in the headers of each request in <username>:<password> syntax. .

    For example:

    curl -X GET -u `<username>:<password>` -H "Content-Type: application/json" \
     'https://accounts.cloud.databricks.com/api/2.0/accounts/<accountId>/<endpoint>'
    
  • Apply base64 encoding to your <username>:<password> string and provide it directly in the HTTP header:

    curl -X GET -H "Content-Type: application/json" \
      -H 'Authorization: Bearer <base64-username-pw>'
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<accountId>/<endpoint>'
    
  • Create a .netrc file with machine, login, and password properties:

    machine accounts.cloud.databricks.com
    login <username>
    password <password>
    

    To invoke the .netrc file, use -n in your curl command:

    curl -n -X GET 'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/workspaces'
    

    This article’s examples use this authentication style.

For the complete API reference, see Multi-workspace API.

Step 1: Configure cross-account authentication

Databricks needs access to a cross-account service IAM role in your AWS account so that Databricks can deploy clusters in the appropriate VPC for the new workspace.

  1. If such a role does not yet exist, see Create a cross-account IAM role for launching multiple workspaces to create an appropriate role and policy for your deployment type. You will need the ARN for your new role (the role_arn) later in this procedure.

    Note

    You can share a cross-account IAM role with multiple workspaces. You are not required to create a new one for each workspace. If you already have one, you can skip this step.

  2. Create a Databricks credentials configuration ID for your AWS role. Call the Create credential configuration API (POST /accounts/<accountId>/credentials). This request establishes cross-account trust and returns a reference ID to use when you create a new workspace.

    Note

    You can share a credentials configuration ID with multiple workspaces. It is not required to create a new one for each workspace. If you already have one, you can skip this step.

    Replace <accountId> with your Databricks account ID. For authentication, see Step 2 earlier on this page. In the request body:

    • Set credentials_name to a name for these credentials. The name must be unique within your account.
    • Set aws_credentials to an object that contains an sts_role property. That object must contain a role_arn property that specifies the AWS role ARN for the role you’ve created.

    The response body will include a credentials_id field, which is the Databricks credentials configuration ID that you need to create the new workspace. Copy this field for later use.

    For example:

     curl -X POST -n \
       'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-mws-master-account-id>/credentials' \
       -d '{
       "credentials_name": "databricks-mws-workspace-credentials-v1",
       "aws_credentials": {
         "sts_role": {
           "role_arn": "arn:aws:iam::<aws-account-id>:role/my-company-example-role"
         }
       }
     }'
    

    Example response:

     {
       "credentials_id": "<databricks-credentials-id>",
       "account_id": "<databricks-mws-master-account-id>",
       "aws_credentials": {
         "sts_role": {
           "role_arn": "arn:aws:iam::<aws-account-id>:role/my-company-example-role",
           "external_id": "<databricks-mws-master-account-id>"
         }
       },
       "credentials_name": "databricks-mws-workspace-credentials-v1",
       "creation_time": 1579753556257
     }
    

    Copy the credentials_id field from the response for later use.

Step 2: Configure root storage

The root storage S3 bucket in your account is required to store objects like cluster logs, notebook revisions, and job results. You can also use the root storage S3 bucket for storage of non-production data, like data you need for testing.

Note

You can share a root S3 bucket with multiple workspaces in a single account. You do not have to create new ones for each workspace. If you share a root S3 bucket for multiple workspaces in an account, data on the root S3 bucket is partitioned into separate directories by workspace. If you already have a bucket and an associated storage configuration ID generated by the Multi-workspace API, you can skip this step.

  1. Create the root S3 bucket, using the instructions in Configure AWS storage (Multi-workspace API).

  2. Create a storage configuration record that represents the root S3 bucket. Specify your root S3 bucket by name by calling the create storage configuration API (POST /accounts/<account-id>/storage-configurations).

    The request returns a storage configuration ID that represents your S3 bucket.

    Pass the following:

    • storage_configuration_name — New unique storage configuration name.
    • root_bucket_info — A JSON object that contains a bucket_name field that contains your S3 bucket name.

    The response body includes a storage_configuration_id property, which is that bucket’s storage configuration ID. Copy that value for later use.

    For example:

    curl -X POST -n \
        'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-mws-master-account-id>/storage-configurations' \
      -d '{
        "storage_configuration_name": "databricks-mws-workspace-storageconf-v1",
        "root_bucket_info": {
          "bucket_name": "my-company-mws-example-bucket"
        }
      }'
    

    Response:

    {
      "storage_configuration_id": "<databricks-storage-config-id>",
      "account_id": "<databricks-mws-master-account-id>",
      "root_bucket_info": {
        "bucket_name": "my-company-mws-example-bucket"
      },
      "storage_configuration_name": "databricks-mws-workspace-storageconf-v1",
      "creation_time": 1579754875555
    }
    

Step 3: Configure customer-managed VPC (optional)

By default, Databricks creates a VPC in your AWS account for each workspace. Databricks uses it for running clusters in the workspace. Optionally, you can use your own VPC for the workspace, using the feature customer-managed VPC. Databricks recommends that you provide your own VPC so that you can configure it according to your organization’s enterprise cloud standards while still conforming to Databricks requirements. Note that you cannot migrate an existing workspace to your own VPC.

Preview

Customer-managed VPC is in Public Preview.

  1. Set up your VPC, subnets, and security groups, using the instructions in Customer-managed VPC. Copy the IDs for each of those objects for the next step, in which you register them with Databricks and get a multi-workspace network ID to represent your new network.

    Important

    You can share one customer-managed VPC with multiple workspaces in a single account. You do not have to create a new VPC for each workspace. However, you cannot reuse subnets or security groups with any other resources, including other workspaces or non-Databricks resources. If you plan to share one VPC with multiple workspaces, be sure to size your VPC and subnets accordingly. Because a Databricks multi-workspace network ID encapsulates this information, you cannot reuse a Databricks multi-workspace network ID across workspaces.

  2. To register your network configuration with Databricks, call the create network configuration API (POST /accounts/<account-id>/networks).

    Pass the following:

    • network_name — New unique network name.
    • vpc_id — VPC ID.
    • subnet_ids — Subnet IDs, as an array.
    • security_group_ids — Security Group IDs, as an array.

    For example:

    curl -X POST -n \
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-mws-master-account-id>/networks' \
      -d '{
      "network_name": "mycompany-vpc-example",
      "vpc_id": "<aws-vpc-id>",
      "subnet_ids": [
        "<aws-subnet-id-1>",
        "<aws-subnet-id-2>"
      ],
      "security_group_ids": [
        "<aws-security-group-id>"
      ]
    }'
    
  3. Copy the network_id from the response body for later use. This is the multi-workspace network ID that represents the network for your new workspace.

    Example Response:

    {
      "network_id": "<databricks-mws-network-id>",
      "account_id": "<databricks-mws-master-account-id>",
      "vpc_id": "<aws-vpc-id>",
      "subnet_ids": [
        "<aws-subnet-id-1>",
        "<aws-subnet-id-2>"
      ],
      "security_group_ids": [
        "<aws-security-group-id>"
      ],
      "vpc_status": "UNATTACHED",
      "network_name": "mycompany-vpc-example",
      "creation_time": 1579767389544
    }
    

Step 4: Configure customer-managed key for notebooks (optional)

By default, notebooks and secrets are encrypted in the control plane using a key unique to the control plane but not the workspace. Optionally, you can specify your own encryption key to encrypt notebooks, a feature known as customer-managed keys for notebooks. This feature is available only if the workspace uses the us-west-1 or us-west-2 region for both the Control Plane and Data Plane (VPC).

Preview

Customer-managed keys for notebooks in the control plane is in Private Preview.

If your master account is enabled to support this feature and you want to use it with this new workspace, you must set up your keys now. You cannot add these keys after you have created the workspace.

Important

You can share a customer-managed key across workspaces. This includes both the key ARN and the Databricks multi-workspace customer-managed key ID that is generated when you register your key with Databricks.

  1. Follow the instructions in Customer managed keys for notebooks to set up a KMS key for use by Databricks. The key_arn, key_alias and key_region fields are necessary.

  2. To update the KMS key information with Databricks, call the create customer-managed key API (POST /accounts/<account-id>/customer-managed-keys).

    Pass the following:

    • key_arn — AWS KMS key ARN.
    • key_alias — AWS KMS key alias.
    • key_region — AWS KMS key region.
    curl -X POST -n \
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-mws-master-account-id>/customer-managed-keys' \
      -d '{
      "aws_key_info": {
        "key_arn": "arn:aws:kms:us-west-2:<aws-account-id>:key/<key-id>",
        "key_alias": "my-example-key",
        "key_region": "us-west-2"
      }
    }'
    
  3. Copy the customer_managed_key_id from the response body for use in a later step.

    Example Response

    {
        "customer_managed_key_id": "<aws-kms-key-id>",
        "creation_time": 1586447506984,
        "account_id": "<databricks-mws-master-account-id>",
        "aws_key_info": {
            "key_arn": "arn:aws:kms:us-west-2:<aws-account-id>:key/<key-id>",
            "key_alias": "my-example-key",
            "key_region": "us-west-2"
        }
    }
    

Step 5: Create the workspace

To create the new workspace, call the create workspace API (POST /accounts/<account-id>/workspaces).

You need the following values that you copied in the previous steps:

  • workspace_name — Human-readable name for your workspace.
  • deployment_name — (Recommended but optional) Unique deployment name for your workspace. Choose your deployment_name value carefully. It becomes the subdomain name CNAME for the workspace. For example, if the deployment name is ABCSales, your workspace URL will be https://abcsales.cloud.databricks.com. This value must contain only characters allowed in a subdomain. In addition, the deployment_name must be unique across all non-deleted workspaces across all AWS regions. If you do not specify a deployment name, the API generates a unique deployment name with the pattern dbc-xxxxxxxx-xxxx.
  • credentials_id — Your credential ID, which represents your cross-account role credentials
  • storage_configuration_id — Your storage configuration ID, which represents your root S3 bucket.
  • network_id — (Optional, only used for customer-managed VPC) Your network ID.
  • customer_managed_key_id — (Optional, only used for customer-managed key) Your key ID.

If the new workspace uses secure cluster connectivity, which is a network architecture with no public IP address or open ports on the data plane VPC, you must also set is_no_public_ip_enabled to true. Your account must be enabled for this feature to set the flag to true.

The JSON response includes the property workspace_id. Copy this value for later use.

For example:

curl -X POST -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-mws-master-account-id>/workspaces' \
  -d '{
  "workspace_name": "my-company-mws-example",
  "deployment_name": "my-company-mws-example",
  "aws_region": "us-west-2",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "network_id": "<databricks-mws-network-id>",
  "customer_managed_key_id": "<aws-kms-key-id>",
  "is_no_public_ip_enabled": true
}'

Example Response:

{
  "workspace_id": 123456789,
  "workspace_name": "my-company-mws-example",
  "aws_region": "us-west-2",
  "creation_time": 1579768294842,
  "deployment_name": "my-company-mws-example",
  "workspace_status": "PROVISIONING",
  "account_id": "<databricks-mws-master-account-id>",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "workspace_status_message": "Workspace resources are being set up.",
  "network_id": "<databricks-mws-network-id>",
  "customer_managed_key_id": "<aws-kms-key-id>"
}

If you specified a customer-managed VPC and the workspace creation step returns a network-related error, you can call the Get Network API (endpoint /networks/<network-id>) to validate the network settings. See Troubleshoot a workspace that failed to deploy.

Step 6: Confirm the new workspace

To check workspace status, call the get workspace API (GET /accounts/<account-id>/workspaces/<workspace-id>).

Use the workspace_id value from the JSON response returned when you created the workspace.

In the response, possible workspace_status values are:

  • NOT_PROVISIONED — Not yet provisioned.
  • PROVISIONING — Still provisioning. Wait a few minutes and repeat this API request.
  • RUNNING — Successful deployment and now running.
  • FAILED — Failed deployment.
  • BANNED — Banned.
  • CANCELLING — In process of cancellation.

See Troubleshoot a workspace that failed to deploy for how to handle unsuccessful status values.

For example:

curl -X GET -n \
'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-mws-master-account-id>/workspaces/<databricks-workspace-id>'

Response:

{
  "workspace_id": 123456789,
  "workspace_name": "my-company-mws-example",
  "aws_region": "us-west-2",
  "creation_time": 1579768294842,
  "deployment_name": "my-company-mws-example",
  "workspace_status": "RUNNING",
  "account_id": "<databricks-mws-master-account-id>",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "workspace_status_message": "Workspace is running.",
  "network_id": "339f16b9-b8a3-4d50-9d1b-7e29e49448c3",
  "customer_managed_key_id": "<aws-kms-key-id>"
}

In this example, the workspace status (workspace_status) is set to RUNNING, so it was successful. If it is PROVISIONING, repeat this API request until it succeeds.

Test your new workspace after its status is RUNNING:

  • User interface login on the new workspace — Confirm you can log in to the web application at URL https://<deployment-name>.cloud.databricks.com. For example, if the deployment name you specified during workspace creation is ABCSales, your workspace URL is https://abcsales.cloud.databricks.com. Use your multi-workspace master account username and password.

  • REST API login on the new workspace — Confirm that you can access the REST API. The following example gets a list of users using the SCIM API. The curl tool will prompt you for a password.

    curl  -u <user-name> -X GET 'https://oregon.cloud.databricks.com/api/2.0/preview/scim/v2/Users'
    

    For more information about using Databricks REST APIs, including other authentication options, see REST API 2.0

Step 7: Post-deployment configuration (optional)

There are other configuration steps that you might want to consider for your new workspace:

Enable IP Access Lists

Configure the IP addresses that can connect to the web application, REST APIs, JDBC/ODBC endpoints, and DBConnect. You can specify whitelists and blacklists as IP addresses or ranges. See IP access lists.

Enable audit logs

Databricks strongly recommends that you configure audit logging to monitor the activities performed and usage incurred by your Databricks users. You must contact your Databricks representative to enable audit logs for your new workspace. See Configure audit logging for instructions.

Troubleshoot a workspace that failed to deploy

For all workspace creation errors, try the following troubleshooting steps in the order provided.

Validate network

If the workspace creation or status check steps indicate a network-related error, call the Get Network API to ensure that the network settings are correct. This API endpoint has the form: /accounts/<$accountId>/networks/<$networkId>

For example:

  curl -X GET -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-mws-master-account-id>/networks/<databricks-mws-network-id>'

In the response, view the warning_messages error_messages fields. If both arrays are empty, there are no warnings or errors.

Otherwise, review the warnings and error JSON objects carefully:

  • For warnings, the warning_type enumeration indicates that the problem was with either a subnet or security group. The warning_message provides additional details. Be aware that if you have a firewall or NAT instance (instead of a NAT gateway), the network validation always issues a warning.
  • For errors, the error_type enumeration indicates that the problem was with either credentials, VPC, subnet, security group, or network ACL. The error_message provides additional details.

Fix infrastructure issues

Depending on the errors in the response to the Get Network API request, confirm that:

  • Your security group complies with the customer-managed VPC requirements.
  • Your cross-account IAM policy includes the required permissions. See Create a cross-account IAM role for launching multiple workspaces for the policy to use for your deployment type.
  • Your Databricks account was enabled by Databricks for multiple workspaces and for any additional features you are using (customer-managed VPC, customer-managed notebooks, secure cluster connectivity). Contact your Databricks representative to confirm.

Update the failed workspace

To update the failed workspace, call the update workspace and redeploy API (PATCH /accounts/<account-id>/workspaces/<workspace-id>).

Important

The update workspace API only supports updates of workspaces and deployments that failed during workspace creation. You cannot use it to update a successfully deployed workspace.

Use credentials_id, storage_configuration_id, network_id and optional, and customer_managed_key_id from the initial request. Fields deployment_name and is_no_public_ip_enabled are reused from the initial request implicitly, although you can specify the latter explicitly without harm.

If the workspace_status value returns PROVISIONING, keep checking for RUNNING state using the get workspace API.

For example:

  curl -X PATCH -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-mws-master-account-id>/workspaces/<databricks-workspace-id>' \
  -d '{
  "aws_region": "us-west-2",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "network_id": "<databricks-mws-network-id>",
  "customer_managed_key_id": "<aws-kms-key-id>",
  "is_no_public_ip_enabled": true
}'

Response:

{
  "workspace_id": 123456789,
  "workspace_name": "my-company-mws-example",
  "aws_region": "us-west-2",
  "creation_time": 1579768294842,
  "deployment_name": "my-company-mws-example",
  "workspace_status": "PROVISIONING",
  "account_id": "<databricks-mws-master-account-id>",
  "credentials_id": "<aws-credentials-id>",
  "storage_configuration_id": "<databricks-storage-config-id>",
  "workspace_status_message": "Workspace resources are being set up.",
  "network_id": "<databricks-mws-network-id>",
  "customer_managed_key_id": "<aws-kms-key-id>"
}

If the workspace update fails, recreate the network and workspace

If the update workspace API doesn’t work, you must delete and recreate the network (if you provided your own VPC) and the failed workspace in the following order.

  1. Delete the workspace using the delete workspace API (DELETE /accounts/<account-id>/workspaces/<workspace-id>).

    For example:

    curl -X DELETE -n \
    'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-mws-master-account-id>/workspaces/<databricks-workspace-id>' \
    
  2. If you provided your own VPC, delete the Databricks network configuration using the delete network configuration API (DELETE /accounts/<account-id>/networks/<network-id>).

    For example:

    curl -X DELETE -n \
    'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-mws-master-account-id>/networks/<databricks-mws-network-id>'
    
  3. Recreate the network using the correct values for vpc_id, subnet_ids and security_group_ids.

  4. Recreate the workspace using the correct values for credentials_id, storage_configuration_id, network_id, optional customer_managed_key_id, and is_no_public_ip_enabled.

    If you get the workspace_status value PROVISIONING, keep checking for RUNNING state using the get workspace API.