Service principals for Databricks automation
A service principal is an identity created for use with automated tools and systems including scripts, apps, and CI/CD platforms.
As a security best practice, Databricks recommends using a Databricks service principal and its Databricks access token instead of your Databricks user or your Databricks personal access token for your workspace user to give automated tools and systems access to Databricks resources. Some benefits to this approach include the following:
You can grant and restrict access to Databricks resources for a Databricks service principal independently of a user. For instance, this allows you to prohibit a Databricks service principal from acting as an admin in your Databricks workspace while still allowing other specific users in your workspace to continue to act as admins.
Users can safeguard their access tokens from being accessed by automated tools and systems.
You can temporarily disable or permanently delete a Databricks service principal without impacting other users. For instance, this allows you to pause or remove access from a Databricks service principal that you suspect is being used in a malicious way.
If a user leaves your organization, you can remove that user without impacting any Databricks service principal.
To create a Databricks service principal, you use these tools and APIs:
You create a Databricks service principal in your workspace with the SCIM API 2.0 (ServicePrincipals) for workspaces. To call this API, you can use tools such as curl or Postman, or you can use Terraform. You cannot use the Databricks user interface.
You create a Databricks access token for a Databricks service principal with the Token Management API 2.0. To call this API, you can use tools such as curl or Postman, or you can use Terraform. You cannot use the Databricks user interface.
This article describes how to:
Create a Databricks service principal in your Databricks workspace.
Create a Databricks access token for the Databricks service principal.
Use curl or Postman
Follow these instructions to use curl
or Postman to create a Databricks service principal in your Databricks workspace and then create a Databricks access token for the Databricks service principal.
To use Terraform instead of curl
or Postman, skip to Create a Databricks service principal.
Requirements
You must be a workspace admin.
A Databricks personal access token for your Databricks workspace user. This enables you to call the Databricks APIs.
If you want to call the Databricks APIs with Postman, note that instead of entering your Databricks workspace instance name, for example dbc-a1b2345c-d6e7.cloud.databricks.com
and your Databricks personal access token for your workspace user for every Postman example in this article, you can define variables and use variables in Postman instead.
If you want to call the Databricks APIs with curl
, this article’s curl
examples use two environment variables, DATABRICKS_HOST
and DATABRICKS_TOKEN
, representing your Databricks workspace instance URL, for example https://dbc-a1b2345c-d6e7.cloud.databricks.com
; and your Databricks personal access token for your workspace user. To set these environment variables, do the following:
To set the environment variables for only the current terminal session, run the following commands. To set the environment variables for all terminal sessions, enter the following commands into your shell’s startup file and then restart your terminal. Replace the example values here with your own values.
export DATABRICKS_HOST="https://dbc-a1b2345c-d6e78.cloud.databricks.com"
export DATABRICKS_TOKEN="dapi1234567890b2cd34ef5a67bc8de90fa12b"
To set the environment variables for only the current Command Prompt session, run the following commands. Replace the example values here with your own values.
set DATABRICKS_HOST="https://dbc-a1b2345c-d6e78.cloud.databricks.com"
set DATABRICKS_TOKEN="dapi1234567890b2cd34ef5a67bc8de90fa12b"
To set the environment variables for all Command Prompt sessions, run the following commands and then restart your Command Prompt. Replace the example values here with your own values.
setx DATABRICKS_HOST "https://dbc-a1b2345c-d6e78.cloud.databricks.com"
setx DATABRICKS_TOKEN "dapi1234567890b2cd34ef5a67bc8de90fa12b"
If you want to call the Databricks APIs with curl
, also note the following:
This article’s
curl
examples use shell command formatting for Unix, Linux, and macOS. For the Windows Command shell, replace\
with^
, and replace${...}
with%...%
.You can use a tool such as jq to format the JSON-formatted output of
curl
for easier reading and querying. This article’scurl
examples usejq
to format the JSON output.If you work with multiple Databricks workspaces, instead of constantly changing the
DATABRICKS_HOST
andDATABRICKS_TOKEN
variables, you can use a .netrc file. If you use a.netrc
file, modify this article’scurl
examples as follows:Change
curl -X
tocurl --netrc -X
Replace
${DATABRICKS_HOST}
with your Databricks workspace instance URL, for examplehttps://dbc-a1b2345c-d6e7.cloud.databricks.com
Remove
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
Create a Databricks service principal
If you already have a Databricks service principal available, skip ahead to the next section to create a Databricks access token for the Databricks service principal.
Note
The following instructions create a service principal at the Databricks workspace level. Databricks also automatically synchronizes the new service principal to the related Databricks account (see How do admins assign users to workspaces?).
You can use tools such as curl
and Postman to add the Databricks service principal to your Databricks workspace. In the following instructions, replace:
<display-name>
with a display name for the Databricks service principal.The
entitlements
array with any additional entitlements for the Databricks service principal. This example grants the Databricks service principal the ability to create clusters. Workspace access and Databricks SQL access is granted to the Databricks service principal by default.<group-id>
with the group ID for any group in your Databricks workspace that you want the Databricks service principal to belong to. (It can be easier to set access permissions on groups instead of each Databricks service principal individually.)To add additional groups, add each group ID to the
groups
array.To get a group ID, call Get groups.
To create a group, Manage groups with the user interface or call the Create group API.
To add access permissions to a group, see Manage groups for user interface options or call the Permissions API 2.0.
To not add the Databricks service principal to any groups, remove the
groups
array.
Run the following command. Make sure the create-service-principal.json
file is in the same directory where you run this command.
In the output of the command, copy the applicationId
value, as you will need it to create a Databricks access token for the Databricks service principal.
curl -X POST \
${DATABRICKS_HOST}/api/2.0/preview/scim/v2/ServicePrincipals \
--header "Content-type: application/scim+json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data @create-service-principal.json \
| jq .
create-service-principal.json
:
{
"displayName": "<display-name>",
"entitlements": [
{
"value": "allow-cluster-create"
}
],
"groups": [
{
"value": "<group-id>"
}
],
"schemas": [ "urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal" ],
"active": true
}
Create a new HTTP request (File > New > HTTP Request).
In the HTTP verb drop-down list, select POST.
For Enter request URL, enter
https://<databricks-instance-name>/api/2.0/preview/scim/v2/ServicePrincipals
, where<databricks-instance-name>
is your Databricks workspace instance name, for exampledbc-a1b2345c-d6e7.cloud.databricks.com
.On the Authorization tab, in the Type list, select Bearer Token.
For Token, enter your Databricks personal access token for your workspace user.
On the Headers tab, add the Key and Value pair of
Content-Type
andapplication/scim+json
On the Body tab, select raw and JSON.
Enter the following body payload:
{ "displayName": "<display-name>", "entitlements": [ { "value": "allow-cluster-create" } ], "groups": [ { "value": "<group-id>" } ], "schemas": [ "urn:ietf:params:scim:schemas:core:2.0:ServicePrincipal" ], "active": true }
Click Send.
In the response payload, copy the
applicationId
value, as you will need it to create a Databricks access token for the Databricks service principal.
Create a Databricks access token for a Databricks service principal
Note
The following steps generate a Databricks personal access token for a service principal assigned to a Databricks workspace. This personal access token can be used by the service principal for automation only within the specified Databricks workspace. You cannot use service principals for Databricks account-level automation. If you attempt to generate a personal access token for a service principal at the Databricks account level, the attempt will fail.
Step 1: Get the ID for the Databricks service principal
If you already have the ID for the Databricks service principal, skip ahead to Step 2.
You can use tools such as curl
and Postman to get the ID for the Databricks service principal. To get the ID, do the following:
Run the following command. In the output of the command, copy the applicationId
value for the Databricks service principal.
curl -X GET \
${DATABRICKS_HOST}/api/2.0/preview/scim/v2/ServicePrincipals \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
| jq .
Create a new HTTP request (File > New > HTTP Request).
In the HTTP verb drop-down list, select GET.
For Enter request URL, enter
https://<databricks-instance-name>/api/2.0/preview/scim/v2/ServicePrincipals
, where<databricks-instance-name>
is your Databricks workspace instance name, for exampledbc-a1b2345c-d6e7.cloud.databricks.com
.On the Authorization tab, in the Type list, select Bearer Token.
For Token, enter your Databricks personal access token for your workspace user.
Click Send.
In the response payload, copy the
applicationId
value for the service principal.
Step 2: Create the Databricks access token for the Databricks service principal
Use curl
or Postman to create the Databricks access token for the Databricks service principal. In the following instructions, replace:
<application-id>
with theapplicationId
value for the Databricks service principal.<comment>
with any comment to be associated with the Databricks access token. To not add a comment, remove thecomment
object.1209600
with the number of seconds that this Databricks access token is valid. This example specifies 14 days.Important
This Databricks access token will no longer be valid after this time period expires, and any CI/CD platform that relies on this Databricks access token may stop working. To prevent this situation, before this time period expires, you must create a new Databricks access token and give it to the CI/CD platform.
Run the following command. Make sure the create-service-principal-token.json
file is in the same directory where you run this command.
In the output of the command, copy the token_value
value, as you will need it to set up your CI/CD platform.
Note
If you get a permission denied message, see Manage token permissions using the admin console to grant the Databricks service principal the Can Use permission to use the Databricks access token. Then run the command again.
curl -X POST \
${DATABRICKS_HOST}/api/2.0/token-management/on-behalf-of/tokens \
--header "Content-type: application/json" \
--header "Authorization: Bearer ${DATABRICKS_TOKEN}" \
--data @create-service-principal-token.json \
| jq .
create-service-principal-token.json
:
{
"application_id": "<application-id>",
"comment": "<comment>",
"lifetime_seconds": 1209600
}
Create a new HTTP request (File > New > HTTP Request).
In the HTTP verb drop-down list, select POST.
For Enter request URL, enter
https://<databricks-instance-name>/api/2.0/token-management/on-behalf-of/tokens
, where<databricks-instance-name>
is your Databricks workspace instance name, for exampledbc-a1b2345c-d6e7.cloud.databricks.com
.On the Authorization tab, in the Type list, select Bearer Token.
For Token, enter your Databricks personal access token for your workspace user.
On the Headers tab, add the Key and Value pair of
Content-Type
andapplication/json
On the Body tab, select raw and JSON.
Enter the following body payload:
{ "application_id": "<application-id>", "comment": "<comment>", "lifetime_seconds": 1209600 }
Click Send.
Note
If you get a permission denied message, see Manage token permissions using the admin console to grant the Databricks service principal the Can Use permission to use the Databricks access token. Then click Send again.
In the response payload, copy the
token_value
value, as you will need to add it to your script, app, or system.
Use Terraform
Follow these instructions to use Terraform to create a Databricks service principal in your Databricks workspace and then create a Databricks access token for the Databricks service principal.
To use curl
or Postman instead of Terraform, skip to Use curl or Postman.
Requirements
A Databricks personal access token to allow Terraform to call the Databricks APIs within the Databricks account. For details, see Databricks personal access tokens.
The Databricks command-line interface (CLI), configured with your Databricks personal access token by running the
databricks configure --token --profile <profile name>
command to create a connection profile for this Databricks personal access token. For details, see the “Set up authentication” and “Connection profiles” sections in Databricks CLI setup & documentation.The Terraform CLI. For details, see Download Terraform on the Terraform website.
Create the Databricks service principal and Databricks access token
In your terminal, create an empty directory and then switch to it. Each separate set of Terraform configuration files must be in its own directory. For example:
mkdir terraform_service_principal_demo && cd terraform_service_principal_demo
.mkdir terraform__service_principal_demo && cd terraform__service_principal_demo
In this empty directory, create a file named
main.tf
. Add the following content to this file, and then save the file.Warning
The following content contains the statement
authorization = "tokens"
. There can be only oneauthorization = "tokens"
permissions resource per Databricks workspace, otherwise there will be a permanent configuration drift. After applying the following changes, users who previously had eitherCAN_USE
orCAN_MANAGE
permission but no longer have either permission have their access to token-based authentication revoked. Their active tokens are immediately deleted (revoked).Note
The following content creates a service principal at the Databricks workspace level. The following content also automatically synchronizes the service principal to the related Databricks account (see How do admins assign users to workspaces?). To create a service principal at the Databricks account level instead, see the “Creating service principal in AWS Databricks account” section of databricks_service_principal Resource in the Databricks Terraform provider documentation.
The following content also uses the
databricks_obo_token
resource to generate a Databricks personal access token for a service principal assigned to a Databricks workspace. This personal access token can be used by the service principal for automation only within the specified Databricks workspace. You cannot use service principals for Databricks account-level automation. If you attempt to generate a personal access token for a service principal at the Databricks account level, the attempt will fail.variable "databricks_account_id" { description = "The Databricks account ID for the Databricks workspace." type = string } variable "databricks_connection_profile" { description = "The name of the Databricks connection profile to use." type = string } variable "service_principal_display_name" { description = "The display name for the service principal." type = string } variable "service_principal_access_token_lifetime" { description = "The lifetime of the service principal's access token, in seconds." type = number default = 3600 } terraform { required_providers { databricks = { source = "databricks/databricks" } } } provider "databricks" { host = "https://accounts.cloud.databricks.com" account_id = var.databricks_account_id profile = var.databricks_connection_profile } resource "databricks_service_principal" "sp" { provider = databricks display_name = var.service_principal_display_name } resource "databricks_permissions" "token_usage" { authorization = "tokens" access_control { service_principal_name = databricks_service_principal.sp.application_id permission_level = "CAN_USE" } } resource "databricks_obo_token" "this" { depends_on = [ databricks_permissions.token_usage ] application_id = databricks_service_principal.sp.application_id comment = "Personal access token on behalf of ${databricks_service_principal.sp.display_name}" lifetime_seconds = var.service_principal_access_token_lifetime } output "service_principal_name" { value = databricks_service_principal.sp.display_name } output "service_principal_id" { value = databricks_service_principal.sp.application_id } output "service_principal_access_token" { value = databricks_obo_token.this.token_value sensitive = true }
Note
To add this service principal to Databricks workspace groups, and to add Databricks workspace entitlements to this service principal, see databricks_service_principal on the Terraform website.
In the same directory, create a file named
terraform.tfvars
. Add the following content to this file, replacing the following values, and then save the file:Replace the
databricks_account_id
value with the Databricks account ID for your workspace.Tip
To use environment variables instead of the
terraform.tfvars
file for this value, set an environment variable namedTF_VAR_DATABRICKS_ACCOUNT_ID
to the Databricks account ID for your workspace. Also remove thedatabricks_account_id
variable frommain.tf
as well as the reference toaccount_id
in thedatabricks
provider inmain.tf
.Replace the
databricks_connection_profile
value with the name of your connection profile from the requirements.Tip
To use environment variables instead of the
terraform.tfvars
file for this value, set an environment variable namedTF_VAR_DATABRICKS_CONFIG_PROFILE
to the name of your connection profile from the requirements. Also remove thedatabricks_connection_profile
variable frommain.tf
as well as the reference toprofile
in thedatabricks
provider inmain.tf
.Replace the
service_principal_display_name
value with a display name for the service principal.Replace the
service_principal_access_token_lifetime
value with the number of seconds for the lifetime of the access token for the service principal.Tip
To use the default lifetime value of 3600 seconds, remove
service_principal_access_token_lifetime
from theterraform.tfvars
file.
databricks_account_id = "<Databricks account ID, such as 00000000-0000-0000-0000-000000000000>" databricks_connection_profile = "<Databricks connection profile name>" service_principal_display_name = "<Service principal display name>" service_principal_access_token_lifetime = 3600
Initialize the working directory containing the
main.tf
file by running theterraform init
command. For more information, see Command: init on the Terraform website.terraform init
Apply the changes required to reach the desired state of the configuration by running the
terraform apply
command. For more information, see Command: apply on the Terraform website.terraform apply
To get the service principal’s access token, see the value of
outputs.service_principal_access_token.value
in theterraform.tfstate
file, which is in the working directory containing themain.tf
file.