Authentication with Google ID tokens
To authenticate to Databricks REST APIs, use Databricks OAuth, our recommended solution across all cloud platforms. OAuth 2.0 is the standard protocol that Databricks uses for delegated authorization. It provides secure access to REST APIs and supports mechanisms such as refresh tokens to maintain access over time.
You can also use Google ID tokens to authenticate, primarily on Google Cloud. These tokens follow open standards such as OpenID Connect (OIDC).
Databricks REST APIs support only the Google-issued OIDC tokens, which are commonly known as Google ID tokens. To reduce confusion, the rest of this article uses the term Google ID token not OIDC token.
If you bring your own identity provider to configure single sign-on to Databricks, you can't authenticate with Google ID tokens.
This article describes the steps to authenticate to use Databricks REST APIs and how to create the required Google Cloud service accounts and generate tokens for these accounts.
Service account setup and authentication flow
You can use a single Google ID token for account-level APIs or workspace-level APIs, but you can't use it for both purposes. The steps for setting up tokens for workspace-level and account-level APIs are mostly the same. The important differences are called out in the instructions.
For production environments, Databricks recommends using two Google Cloud service accounts:
- SA-1: Runs your workloads and fetches tokens.
- SA-2: Holds permissions to your Databricks and GCP resources.
Grant SA-1 permission to impersonate SA-2 in order to call Databricks REST APIs.
With this impersonation model, one team can manage workload security and another team can manage resource security. Because you only grant the impersonation permissions as needed, this approach offers security and flexibility to your organization.
For development or testing, you can simplify the setup by using one of the following options:
- Use your Google user account to impersonate SA-2. You must have the
roles/iam.serviceAccountTokenCreator
role. - Use a single service account to act as both SA-1 and SA-2.
Credentials passthrough
A few Databricks REST API methods require credentials passthrough. To call these methods, in addition to the Google ID, you must also pass a Google OAuth access token with the cloud-platform
scope in an HTTP header. The Databricks server uses the Google OAuth access token to call Google Cloud APIs on behalf of the caller.
Databricks doesn't validate or preserve the access token.
To determine whether credentials passthrough is required for an operation, refer to the API documentation for each API operation. These APIs require the X-Databricks-GCP-SA-Access-Token
HTTP header in the request.
Step 1: Create two service accounts
-
Create two new Google Cloud service accounts. Follow the instructions in the Google article Creating a service account. To use the Google Cloud Console, go to the Service Accounts page and choose a Google Cloud project to create it in. The Google Cloud project in which you create these service accounts does not need to match the project that you use for Databricks workspace, nor do the new service accounts need to use the same Google Cloud project as each other.
- Token-creating service account (SA-1): This service account automates creation of tokens for the main service account. These tokens will be used to call Databricks REST APIs. Google documentation calls this SA-1.
- Main service account for Databricks REST APIs (SA-2): This service account acts as a principal (the automation user) for Databricks REST APIs and automated workflows. Google documentation calls this SA-2.
Save the email address for both service accounts for use in later steps.
-
Create a service account key for your token-creating service account (SA-1) and save it to a local file called
SA-1-key.json
.- From the Google Cloud Console Service Accounts page, click the email address for SA-1.
- Go to the Keys tab.
- Click Add key > Create new key.
- For the key type, choose JSON.
- Click Create.
- The web page downloads a key file to your browser. Move that file to your local working directory and rename it
SA-1-key.json
.
For additional instructions, see the Google article Creating service account keys.
-
Grant your token-creating service account (SA-1) the Service Account Token Creator Role on your main service account (SA-2).
- From the Google Cloud Console Service Accounts page, click the email address for SA-2.
- Go to the Permissions tab.
- Click Principals with access.
- Click Grant access.
- In the New Principals field, paste the email address for your token-creating SA (SA-1).
- In the Role field, choose Service Account Token Creator.
- Click Save.
Step 2: Create a Google ID token
Databricks recommends using the Google Cloud CLI (gcloud
) to generate ID tokens to call Databricks REST APIs.
The generated ID token expires in one hour. You must finish all remaining steps within that time. If the token expires before you complete the later steps, such as calling Databricks APIs, you must repeat this step to generate a new Google ID token.
-
Install the Google Cloud CLI on your machine. See the Google article on installing the gcloud tool.
-
Generate ID tokens for your main service account by running the following commands.
- Replace
<SA-1-key-json>
with the path to your SA-1 key file in JSON format. - Replace
<SA-2-email>
with SA-2's email address. - Replace
<audience>
as follows based on your use case:- For workspace-level APIs, replace with your workspace URL, which has the form
https://999999999992360.0.gcp.databricks.com
. Every workspace has a different unique workspace URL. To call APIs on multiple workspaces, create multiple Google ID tokens with differentaudience
values. - For account-level API, replace with
https://accounts.gcp.databricks.com
. Different accounts all share the sameaudience
value.
- For workspace-level APIs, replace with your workspace URL, which has the form
Run the following commands for use with production systems:
Bashgcloud auth login --cred-file=<SA-1-key-json>
gcloud auth print-identity-token \
--impersonate-service-account="<SA-2-email>" \
--include-email \
--audiences="<audience>"For non-production use, if you use your user account to impersonate SA-2, use these commands:
Bashgcloud auth login
gcloud auth print-identity-token \
--impersonate-service-account="<SA-2-email>" \
--audiences="<audience>" --include-emailFor non-production use, if you use one service account for both SA-1 and SA-2, use these commands with the service account's key JSON file:
Bashgcloud auth login --cred-file=<SA-key-json>
gcloud auth print-identity-token --audiences="<audience>" - Replace
-
Save the long line at the end of the output to a file called
google-id-token-sa-2.txt
.It outputs text similar to the following:
WARNING: This command is using service account impersonation. All API calls will be executed as [<SA-2-email>].
eyJhba7s86dfa9s8f6a99das7fa68s7d6...N8s67f6saa78sa8s7dfiLlA
Step 3: Create a Google OAuth access token (only for APIs that require credentials passthrough)
This step is required only to call APIs that require credentials passthrough. To determine whether credentials passthrough is required for an operation, refer to the API documentation for each API operation.
The request to generate an access token includes a lifetime
field that defines how long the access token is valid. If you only need the token to be active for five minutes, set to 300s
(300 seconds). The following example uses 3600s
, which represents one hour.
- You must finish all remaining steps within that time limit. If the time expires before you complete the later steps, such as calling Databricks APIs, you must repeat this step to generate a new Google OAuth access token.
- By default, an hour (
3600s
) is the maximum duration you can set for thelifetime
field. To extend this limit, contact Google customer support and request an exception.
-
Run the following command. Replace
<SA-2-email>
with the service account email address for SA-2. For non-production use or testing, if you are using a single service account or using a user account to impersonate a service account, replace<SA-2-email>
with the email address for the service account.Bashgcloud auth print-access-token --impersonate-service-account=<SA-2-email>
-
Save the long line at the end of the output to a file called
access-token-sa-2.txt
.It outputs text similar to the following:
WARNING: This command is using service account impersonation. All API calls will be executed as [<SA-2-email>].
eyJhba7s86dfa9s8f6a99das7fa68s7d6...N8s67f6saa78sa8s7dfiLlA
Step 4: Add the main service account as a user
To use Google ID tokens to call Databricks workspace-level APIs or account-level APIs, add the main service account (SA-2) as a user in the appropriate Databricks environment:
- For workspace-level APIs, add SA-2 to the workspace as a user.
- For account-level APIs, add SA-2 in the account console as a user.
You must generate a separate Google ID token for each API type because the audience
field changes based on the base URL. For details, see Create a Google ID token.
Databricks uses SA-2 as the caller identity in API requests. You don’t need to add the token-creating service account (SA-1) to Databricks. SA-1 only impersonates SA-2 to generate tokens and doesn’t interact with Databricks directly.
Workspace APIs
To authenticate workspace-level APIs with the Google ID token, follow the steps on the Workspace admin settings tab in Add users to your account. Enter the email address of your main service account (SA-2) when prompted.
Optionally add any required group memberships and Databricks access control settings for your new service account, depending on which REST APIs you plan to call and the data objects you want to use. See Groups and Access control lists.
Account-level APIs
To authenticate account-level APIs with the Google ID token, follow the steps on the Account console tab in Add users to your account. Enter the email address of your main service account (SA-2) when prompted. Set the name to clearly reflect the service account’s purpose.
Step 5: Call a Databricks API
The tokens you need to provide during REST API authentication varies on your planned usage: either Account API or Workspace-level APIs. Note that you cannot use one Google ID token to access both types of APIs because of the difference in the audiences
field when creating the Google ID token.
The following HTTP headers are used for Databricks authentication with Google IDs.
HTTP header name | Description | Which types of APIs require it? |
---|---|---|
| Google ID token for SA-2 as a bearer token. Syntax is | All APIs |
| Google OAuth access token for SA-2. | Account-level APIs only |
Example workspace-level API request
To call a Databricks REST API for a workspace, pass a Google ID token in the Authorization
HTTP header with the following syntax:
Authorization: Bearer <google-id-token>
The token you provide must have the following attributes:
- The workspace you access must match the workspace URL that you provided when you created the token. See Step 2: Create a Google ID token.
- The service account that is impersonated (SA-2) must be a user of the workspace. See Workspace APIs.
The following example calls a workspace-level API to list clusters.
- Replace
<google-id-token>
with the Google ID token you saved in filegoogle-id-token-sa-2.txt
. - Replace
<workspace-URL>
with your base workspace URL, which has the form similar tohttps://1234567890123456.7.gcp.databricks.com
.
curl \
-X GET \
--header 'Authorization: Bearer <google-id-token>' \
<workspace-URL>/api/2.0/clusters/list
Example account-level API request for an API that doesn't use credential passthrough
The following example calls the Account API to get a list of workspaces.
- Replace
<google-id-token>
with the Google ID token you saved in filegoogle-id-token-sa-2.txt
. - Replace
<account-id>
with your account ID. To find your account ID:- As an account admin, go to the Databricks account console.
- Click the down arrow next to your username in the upper right corner.
- In the dropdown menu you can copy your Account ID.
curl \
-X GET \
--header 'Authorization: Bearer <google-id-token>' \
https://accounts.gcp.databricks.com/api/2.0/accounts/<account-id>/workspaces
Example account-level API request with credential passthrough
The following example calls the Account API to get a list of workspaces.
-
Replace
<google-id-token>
with the Google ID token you saved in filegoogle-id-token-sa-2.txt
. -
Replace
<access-token-sa-2>
with the SA-2 access token that you saved in fileaccess-token-sa-2.txt
. -
Replace
<account-id>
with your account ID. To find your account ID:- As an account admin, go to the Databricks account console.
- In the upper right, click the user profile icon.
- In the popup that appears, copy the account ID by clicking the icon to the right of the ID.
curl \
-X GET \
--header 'Authorization: Bearer <google-id-token>' \
--header 'X-Databricks-GCP-SA-Access-Token: <access-token-sa-2>' \
https://accounts.gcp.databricks.com/api/2.0/accounts/<account-id>/workspaces