Service principals for Databricks automation

A service principal is an identity created for use with automated tools and systems including scripts, apps, and CI/CD platforms.

As a security best practice, Databricks recommends using a Databricks service principal and its Databricks access token instead of your Databricks user or your Databricks personal access token for your workspace user to give automated tools and systems access to Databricks resources. Some benefits to this approach include the following:

  • You can grant and restrict access to Databricks resources for a Databricks service principal independently of a user. For instance, this allows you to prohibit a Databricks service principal from acting as an admin in your Databricks workspace while still allowing other specific users in your workspace to continue to act as admins.

  • Users can safeguard their access tokens from being accessed by automated tools and systems.

  • You can temporarily disable or permanently delete a Databricks service principal without impacting other users. For instance, this allows you to pause or remove access from a Databricks service principal that you suspect is being used in a malicious way.

  • If a user leaves your organization, you can remove that user without impacting any Databricks service principal.

To create a Databricks service principal, you use these tools and APIs:

This article describes how to:

  1. Create a Databricks service principal in your Databricks workspace.

  2. Create a Databricks access token for the Databricks service principal.

Use curl or Postman

Follow these instructions to use curl or Postman to create a Databricks service principal in your Databricks workspace and then create a Databricks access token for the Databricks service principal.

To use Terraform instead of curl or Postman, skip to Create a Databricks service principal.

Requirements

  • A Databricks personal access token for your Databricks workspace user. This enables you to call the Databricks APIs.

  • curl or Postman to call the Databricks APIs.

If you want to call the Databricks APIs with Postman, note that instead of entering your Databricks workspace instance name, for example dbc-a1b2345c-d6e7.cloud.databricks.com and your Databricks personal access token for your workspace user for every Postman example in this article, you can define variables and use variables in Postman instead.

If you want to call the Databricks APIs with curl, this article’s curl examples use two environment variables, DATABRICKS_HOST and DATABRICKS_TOKEN, representing your Databricks workspace instance URL, for example https://dbc-a1b2345c-d6e7.cloud.databricks.com; and your Databricks personal access token for your workspace user. To set these environment variables, do the following:

To set the environment variables for only the current terminal session, run the following commands. To set the environment variables for all terminal sessions, enter the following commands into your shell’s startup file and then restart your terminal. Replace the example values here with your own values.

export DATABRICKS_HOST="https://dbc-a1b2345c-d6e78.cloud.databricks.com"
export DATABRICKS_TOKEN="dapi1234567890b2cd34ef5a67bc8de90fa12b"

To set the environment variables for only the current Command Prompt session, run the following commands. Replace the example values here with your own values.

set DATABRICKS_HOST="https://dbc-a1b2345c-d6e78.cloud.databricks.com"
set DATABRICKS_TOKEN="dapi1234567890b2cd34ef5a67bc8de90fa12b"

To set the environment variables for all Command Prompt sessions, run the following commands and then restart your Command Prompt. Replace the example values here with your own values.

setx DATABRICKS_HOST "https://dbc-a1b2345c-d6e78.cloud.databricks.com"
setx DATABRICKS_TOKEN "dapi1234567890b2cd34ef5a67bc8de90fa12b"

If you want to call the Databricks APIs with curl, also note the following:

  • This article’s curl examples use shell command formatting for Unix, Linux, and macOS. For the Windows Command shell, replace \ with ^, and replace ${...} with %...%.

  • You can use a tool such as jq to format the JSON-formatted output of curl for easier reading and querying. This article’s curl examples use jq to format the JSON output.

  • If you work with multiple Databricks workspaces, instead of constantly changing the DATABRICKS_HOST and DATABRICKS_TOKEN variables, you can use a .netrc file. If you use a .netrc file, modify this article’s curl examples as follows:

    • Change curl -X to curl --netrc -X

    • Replace ${DATABRICKS_HOST} with your Databricks workspace instance URL, for example https://dbc-a1b2345c-d6e7.cloud.databricks.com

    • Remove --header "Authorization: Bearer ${DATABRICKS_TOKEN}" \

Create a Databricks service principal

If you already have a Databricks service principal available, skip ahead to the next section to create a Databricks access token for the Databricks service principal.

You can use tools such as curl and Postman to add the Databricks service principal to your Databricks workspace. In the following instructions, replace:

  • <display-name> with a display name for the Databricks service principal.

  • The entitlements array with any additional entitlements for the Databricks service principal. This example grants the Databricks service principal the ability to create clusters. Workspace access and Databricks SQL access is granted to the Databricks service principal by default.

  • <group-id> with the group ID for any group in your Databricks workspace that you want the Databricks service principal to belong to. (It can be easier to set access permissions on groups instead of each Databricks service principal individually.)

    • To add additional groups, add each group ID to the groups array.

    • To get a group ID, call Get groups.

    • To create a group, Manage groups with the user interface or call the Create group API.

    • To add access permissions to a group, see Manage groups for user interface options or call the Permissions API 2.0.

    • To not add the Databricks service principal to any groups, remove the groups array.

Run the following command. Make sure the create-service-principal.json file is in the same directory where you run this command.

In the output of the command, copy the applicationId value, as you will need it to create a Databricks access token for the Databricks service principal.

curl -X POST \
${DATABRICKS_HOST}/api/2.0/preview/scim/v2/ServicePrincipals \
--header "Content-type: application/scim+json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data @create-service-principal.json \
| jq .

create-service-principal.json:

{
  "displayName": "<display-name>",
  "entitlements": [
    {
      "value": "allow-cluster-create"
    }
  ],
  "groups": [
    {
      "value": "<group-id>"
    }
  ],
  "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal" ],
  "active": true
}
  1. Create a new HTTP request (File > New > HTTP Request).

  2. In the HTTP verb drop-down list, select POST.

  3. For Enter request URL, enter https://<databricks-instance-name>/api/2.0/preview/scim/v2/ServicePrincipals, where <databricks-instance-name> is your Databricks workspace instance name, for example dbc-a1b2345c-d6e7.cloud.databricks.com.

  4. On the Authorization tab, in the Type list, select Bearer Token.

  5. For Token, enter your Databricks personal access token for your workspace user.

  6. On the Headers tab, add the Key and Value pair of Content-Type and application/scim+json

  7. On the Body tab, select raw and JSON.

  8. Enter the following body payload:

    {
      "displayName": "<display-name>",
      "entitlements": [
        {
          "value": "allow-cluster-create"
        }
      ],
      "groups": [
        {
          "value": "<group-id>"
        }
      ],
      "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal" ],
      "active": true
    }
    
  9. Click Send.

  10. In the response payload, copy the applicationId value, as you will need it to create a Databricks access token for the Databricks service principal.

Create a Databricks access token for a Databricks service principal

Step 1: Get the ID for the Databricks service principal

If you already have the ID for the Databricks service principal, skip ahead to Step 2.

You can use tools such as curl and Postman to get the ID for the Databricks service principal. To get the ID, do the following:

Run the following command. In the output of the command, copy the applicationId value for the Databricks service principal.

curl -X GET \
${DATABRICKS_HOST}/api/2.0/preview/scim/v2/ServicePrincipals \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
| jq .
  1. Create a new HTTP request (File > New > HTTP Request).

  2. In the HTTP verb drop-down list, select GET.

  3. For Enter request URL, enter https://<databricks-instance-name>/api/2.0/preview/scim/v2/ServicePrincipals, where <databricks-instance-name> is your Databricks workspace instance name, for example dbc-a1b2345c-d6e7.cloud.databricks.com.

  4. On the Authorization tab, in the Type list, select Bearer Token.

  5. For Token, enter your Databricks personal access token for your workspace user.

  6. Click Send.

  7. In the response payload, copy the applicationId value for the service principal.

Step 2: Create the Databricks access token for the Databricks service principal

Use curl or Postman to create the Databricks access token for the Databricks service principal. In the following instructions, replace:

  • <application-id> with the applicationId value for the Databricks service principal.

  • <comment> with any comment to be associated with the Databricks access token. To not add a comment, remove the comment object.

  • 1209600 with the number of seconds that this Databricks access token is valid. This example specifies 14 days.

    Important

    This Databricks access token will no longer be valid after this time period expires, and any CI/CD platform that relies on this Databricks access token may stop working. To prevent this situation, before this time period expires, you must create a new Databricks access token and give it to the CI/CD platform.

Run the following command. Make sure the create-service-principal-token.json file is in the same directory where you run this command.

In the output of the command, copy the token_value value, as you will need it to set up your CI/CD platform.

Note

If you get a permission denied message, see Manage token permissions using the admin console to grant the Databricks service principal the Can Use permission to use the Databricks access token. Then run the command again.

curl -X POST \
${DATABRICKS_HOST}/api/2.0/token-management/on-behalf-of/tokens \
--header "Content-type: application/json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data @create-service-principal-token.json \
| jq .

create-service-principal-token.json:

{
  "application_id": "<application-id>",
  "comment": "<comment>",
  "lifetime_seconds": 1209600
}
  1. Create a new HTTP request (File > New > HTTP Request).

  2. In the HTTP verb drop-down list, select POST.

  3. For Enter request URL, enter https://<databricks-instance-name>/api/2.0/token-management/on-behalf-of/tokens, where <databricks-instance-name> is your Databricks workspace instance name, for example dbc-a1b2345c-d6e7.cloud.databricks.com.

  4. On the Authorization tab, in the Type list, select Bearer Token.

  5. For Token, enter your Databricks personal access token for your workspace user.

  6. On the Headers tab, add the Key and Value pair of Content-Type and application/json

  7. On the Body tab, select raw and JSON.

  8. Enter the following body payload:

    {
      "application_id": "<application-id>",
      "comment": "<comment>",
      "lifetime_seconds": 1209600
    }
    
  9. Click Send.

    Note

    If you get a permission denied message, see Manage token permissions using the admin console to grant the Databricks service principal the Can Use permission to use the Databricks access token. Then click Send again.

  10. In the response payload, copy the token_value value, as you will need to add it to your script, app, or system.

Use Terraform

Follow these instructions to use Terraform to create a Databricks service principal in your Databricks workspace and then create a Databricks access token for the Databricks service principal.

To use curl or Postman instead of Terraform, skip to Use curl or Postman.

Requirements

  • A Databricks personal access token to allow Terraform to call the Databricks APIs within the Databricks account. For details, see Authentication using Databricks personal access tokens.

  • The Databricks command-line interface (CLI), configured with your Databricks personal access token by running the databricks configure --token --profile <profile name> command to create a connection profile for this Databricks personal access token. For details, see the “Set up authentication” and “Connection profiles” sections in Databricks CLI.

  • The Terraform CLI. For details, see Download Terraform on the Terraform website.

Create the Databricks service principal and Databricks access token

  1. In your terminal, create an empty directory and then switch to it. Each separate set of Terraform configuration files must be in its own directory. For example: mkdir terraform_service_principal_demo && cd terraform_service_principal_demo.

    mkdir terraform__service_principal_demo && cd terraform__service_principal_demo
    
  2. In this empty directory, create a file named main.tf. Add the following content to this file, and then save the file.

    variable "databricks_account_id" {
      description = "The Databricks account ID for the Databricks workspace."
      type        = string
    }
    
    variable "databricks_connection_profile" {
      description = "The name of the Databricks connection profile to use."
      type        = string
    }
    
    variable "service_principal_display_name" {
      description = "The display name for the service principal."
      type        = string
    }
    
    variable "service_principal_access_token_lifetime" {
      description = "The lifetime of the service principal's access token, in seconds."
      type        = number
      default     = 3600
    }
    
    terraform {
      required_providers {
        databricks = {
          source = "databricks/databricks"
        }
      }
    }
    
    provider "databricks" {
      host       = "https://accounts.cloud.databricks.com"
      account_id = var.databricks_account_id
      profile    = var.databricks_connection_profile
    }
    
    resource "databricks_service_principal" "sp" {
      provider     = databricks
      display_name = var.service_principal_display_name
    }
    
    resource "databricks_permissions" "token_usage" {
      authorization    = "tokens"
      access_control {
        service_principal_name = databricks_service_principal.sp.application_id
        permission_level       = "CAN_USE"
      }
    }
    
    resource "databricks_obo_token" "this" {
      depends_on       = [ databricks_permissions.token_usage ]
      application_id   = databricks_service_principal.sp.application_id
      comment          = "Personal access token on behalf of ${databricks_service_principal.sp.display_name}"
      lifetime_seconds = var.service_principal_access_token_lifetime
    }
    
    output "service_principal_name" {
      value = databricks_service_principal.sp.display_name
    }
    
    output "service_principal_id" {
      value = databricks_service_principal.sp.application_id
    }
    
    output "service_principal_access_token" {
      value     = databricks_obo_token.this.token_value
      sensitive = true
    }
    

    Note

    To add this service principal to Databricks workspace groups, and to add Databricks workspace entitlements to this service principal, see databricks_service_principal on the Terraform website.

  3. In the same directory, create a file named terraform.tfvars. Add the following content to this file, replacing the following values, and then save the file:

    • Replace the databricks_account_id value with the Databricks account ID for your workspace.

      Tip

      To use environment variables instead of the terraform.tfvars file for this value, set an environment variable named TF_VAR_DATABRICKS_ACCOUNT_ID to the Databricks account ID for your workspace. Also remove the databricks_account_id variable from main.tf as well as the reference to account_id in the databricks provider in main.tf.

    • Replace the databricks_connection_profile value with the name of your connection profile from the requirements.

      Tip

      To use environment variables instead of the terraform.tfvars file for this value, set an environment variable named TF_VAR_DATABRICKS_CONFIG_PROFILE to the name of your connection profile from the requirements. Also remove the databricks_connection_profile variable from main.tf as well as the reference to profile in the databricks provider in main.tf.

    • Replace the service_principal_display_name value with a display name for the service principal.

    • Replace the service_principal_access_token_lifetime value with the number of seconds for the lifetime of the access token for the service principal.

      Tip

      To use the default lifetime value of 3600 seconds, remove service_principal_access_token_lifetime from the terraform.tfvars file.

    databricks_account_id                   = "<Databricks account ID, such as 00000000-0000-0000-0000-000000000000>"
    databricks_connection_profile           = "<Databricks connection profile name>"
    service_principal_display_name          = "<Service principal display name>"
    service_principal_access_token_lifetime = 3600
    
  4. Initialize the working directory containing the main.tf file by running the terraform init command. For more information, see Command: init on the Terraform website.

    terraform init
    
  5. Apply the changes required to reach the desired state of the configuration by running the terraform apply command. For more information, see Command: apply on the Terraform website.

    terraform apply
    
  6. To get the service principal’s access token, see the value of outputs.service_principal_access_token.value in the terraform.tfstate file, which is in the working directory containing the main.tf file.