Skip to main content

Authentication with Google ID tokens

To authenticate to Databricks REST APIs, use Databricks OAuth, our recommended solution across all cloud platforms. OAuth 2.0 is the standard protocol that Databricks uses for delegated authorization. It provides secure access to REST APIs and supports mechanisms such as refresh tokens to maintain access over time.

You can also use Google ID tokens to authenticate, primarily on Google Cloud. These tokens follow open standards such as OpenID Connect (OIDC).

Databricks REST APIs support only the Google-issued OIDC tokens, which are commonly known as Google ID tokens. To reduce confusion, the rest of this article uses the term Google ID token not OIDC token.

note

If you bring your own identity provider to configure single sign-on to Databricks, you can't authenticate with Google ID tokens.

This article describes the steps to authenticate to use Databricks REST APIs and how to create the required Google Cloud service accounts and generate tokens for these accounts.

Service account setup and authentication flow

You can use a single Google ID token for account-level APIs or workspace-level APIs, but you can't use it for both purposes. The steps for setting up tokens for workspace-level and account-level APIs are mostly the same. The important differences are called out in the instructions.

For production environments, Databricks recommends using two Google Cloud service accounts:

  • SA-1: Runs your workloads and fetches tokens.
  • SA-2: Holds permissions to your Databricks and GCP resources.

Grant SA-1 permission to impersonate SA-2 in order to call Databricks REST APIs.

With this impersonation model, one team can manage workload security and another team can manage resource security. Because you only grant the impersonation permissions as needed, this approach offers security and flexibility to your organization.

For development or testing, you can simplify the setup by using one of the following options:

  • Use your Google user account to impersonate SA-2. You must have the roles/iam.serviceAccountTokenCreator role.
  • Use a single service account to act as both SA-1 and SA-2.

Credentials passthrough

A few Databricks REST API methods require credentials passthrough. To call these methods, in addition to the Google ID, you must also pass a Google OAuth access token with the cloud-platform scope in an HTTP header. The Databricks server uses the Google OAuth access token to call Google Cloud APIs on behalf of the caller.

Databricks doesn't validate or preserve the access token.

important

To determine whether credentials passthrough is required for an operation, refer to the API documentation for each API operation. These APIs require the X-Databricks-GCP-SA-Access-Token HTTP header in the request.

Step 1: Create two service accounts

  1. Create two new Google Cloud service accounts. Follow the instructions in the Google article Creating a service account. To use the Google Cloud Console, go to the Service Accounts page and choose a Google Cloud project to create it in. The Google Cloud project in which you create these service accounts does not need to match the project that you use for Databricks workspace, nor do the new service accounts need to use the same Google Cloud project as each other.

    • Token-creating service account (SA-1): This service account automates creation of tokens for the main service account. These tokens will be used to call Databricks REST APIs. Google documentation calls this SA-1.
    • Main service account for Databricks REST APIs (SA-2): This service account acts as a principal (the automation user) for Databricks REST APIs and automated workflows. Google documentation calls this SA-2.

    Save the email address for both service accounts for use in later steps.

  2. Create a service account key for your token-creating service account (SA-1) and save it to a local file called SA-1-key.json.

    1. From the Google Cloud Console Service Accounts page, click the email address for SA-1.
    2. Go to the Keys tab.
    3. Click Add key > Create new key.
    4. For the key type, choose JSON.
    5. Click Create.
    6. The web page downloads a key file to your browser. Move that file to your local working directory and rename it SA-1-key.json.

    For additional instructions, see the Google article Creating service account keys.

  3. Grant your token-creating service account (SA-1) the Service Account Token Creator Role on your main service account (SA-2).

    1. From the Google Cloud Console Service Accounts page, click the email address for SA-2.
    2. Go to the Permissions tab.
    3. Click Principals with access.
    4. Click Grant access.
    5. In the New Principals field, paste the email address for your token-creating SA (SA-1).
    6. In the Role field, choose Service Account Token Creator.
    7. Click Save.

Step 2: Create a Google ID token

Databricks recommends using the Google Cloud CLI (gcloud) to generate ID tokens to call Databricks REST APIs.

important

The generated ID token expires in one hour. You must finish all remaining steps within that time. If the token expires before you complete the later steps, such as calling Databricks APIs, you must repeat this step to generate a new Google ID token.

  1. Install the Google Cloud CLI on your machine. See the Google article on installing the gcloud tool.

  2. Generate ID tokens for your main service account by running the following commands.

    • Replace <SA-1-key-json> with the path to your SA-1 key file in JSON format.
    • Replace <SA-2-email> with SA-2's email address.
    • Replace <audience> as follows based on your use case:
      • For workspace-level APIs, replace with your workspace URL, which has the form https://999999999992360.0.gcp.databricks.com. Every workspace has a different unique workspace URL. To call APIs on multiple workspaces, create multiple Google ID tokens with different audience values.
      • For account-level API, replace with https://accounts.gcp.databricks.com. Different accounts all share the same audience value.

    Run the following commands for use with production systems:

    Bash
    gcloud auth login --cred-file=<SA-1-key-json>

    gcloud auth print-identity-token \
    --impersonate-service-account="<SA-2-email>" \
    --include-email \
    --audiences="<audience>"

    For non-production use, if you use your user account to impersonate SA-2, use these commands:

    Bash
    gcloud auth login

    gcloud auth print-identity-token \
    --impersonate-service-account="<SA-2-email>" \
    --audiences="<audience>" --include-email

    For non-production use, if you use one service account for both SA-1 and SA-2, use these commands with the service account's key JSON file:

    Bash
    gcloud auth login --cred-file=<SA-key-json>

    gcloud auth print-identity-token --audiences="<audience>"
  3. Save the long line at the end of the output to a file called google-id-token-sa-2.txt.

    It outputs text similar to the following:

    WARNING: This command is using service account impersonation. All API calls will be executed as [<SA-2-email>].

    eyJhba7s86dfa9s8f6a99das7fa68s7d6...N8s67f6saa78sa8s7dfiLlA

Step 3: Create a Google OAuth access token (only for APIs that require credentials passthrough)

note

This step is required only to call APIs that require credentials passthrough. To determine whether credentials passthrough is required for an operation, refer to the API documentation for each API operation.

The request to generate an access token includes a lifetime field that defines how long the access token is valid. If you only need the token to be active for five minutes, set to 300s (300 seconds). The following example uses 3600s, which represents one hour.

important
  • You must finish all remaining steps within that time limit. If the time expires before you complete the later steps, such as calling Databricks APIs, you must repeat this step to generate a new Google OAuth access token.
  • By default, an hour (3600s) is the maximum duration you can set for the lifetime field. To extend this limit, contact Google customer support and request an exception.
  1. Run the following command. Replace <SA-2-email> with the service account email address for SA-2. For non-production use or testing, if you are using a single service account or using a user account to impersonate a service account, replace <SA-2-email> with the email address for the service account.

    Bash
    gcloud auth print-access-token --impersonate-service-account=<SA-2-email>
  2. Save the long line at the end of the output to a file called access-token-sa-2.txt.

    It outputs text similar to the following:

    WARNING: This command is using service account impersonation. All API calls will be executed as [<SA-2-email>].

    eyJhba7s86dfa9s8f6a99das7fa68s7d6...N8s67f6saa78sa8s7dfiLlA

Step 4: Add the main service account as a user

To use Google ID tokens to call Databricks workspace-level APIs or account-level APIs, add the main service account (SA-2) as a user in the appropriate Databricks environment:

You must generate a separate Google ID token for each API type because the audience field changes based on the base URL. For details, see Create a Google ID token.

note

Databricks uses SA-2 as the caller identity in API requests. You don’t need to add the token-creating service account (SA-1) to Databricks. SA-1 only impersonates SA-2 to generate tokens and doesn’t interact with Databricks directly.

Workspace APIs

To authenticate workspace-level APIs with the Google ID token, follow the steps on the Workspace admin settings tab in Add users to your account. Enter the email address of your main service account (SA-2) when prompted.

Optionally add any required group memberships and Databricks access control settings for your new service account, depending on which REST APIs you plan to call and the data objects you want to use. See Groups and Access control lists.

Account-level APIs

To authenticate account-level APIs with the Google ID token, follow the steps on the Account console tab in Add users to your account. Enter the email address of your main service account (SA-2) when prompted. Set the name to clearly reflect the service account’s purpose.

Step 5: Call a Databricks API

The tokens you need to provide during REST API authentication varies on your planned usage: either Account API or Workspace-level APIs. Note that you cannot use one Google ID token to access both types of APIs because of the difference in the audiences field when creating the Google ID token.

The following HTTP headers are used for Databricks authentication with Google IDs.

HTTP header name

Description

Which types of APIs require it?

Authorization

Google ID token for SA-2 as a bearer token. Syntax is Authentication: Bearer <token>.

All APIs

X-Databricks-GCP-SA-Access-Token

Google OAuth access token for SA-2.

Account-level APIs only

Example workspace-level API request

To call a Databricks REST API for a workspace, pass a Google ID token in the Authorization HTTP header with the following syntax:

Authorization: Bearer <google-id-token>

The token you provide must have the following attributes:

  • The workspace you access must match the workspace URL that you provided when you created the token. See Step 2: Create a Google ID token.
  • The service account that is impersonated (SA-2) must be a user of the workspace. See Workspace APIs.

The following example calls a workspace-level API to list clusters.

  • Replace <google-id-token> with the Google ID token you saved in file google-id-token-sa-2.txt.
  • Replace <workspace-URL> with your base workspace URL, which has the form similar to https://1234567890123456.7.gcp.databricks.com.
Bash
curl \
-X GET \
--header 'Authorization: Bearer <google-id-token>' \
<workspace-URL>/api/2.0/clusters/list

Example account-level API request for an API that doesn't use credential passthrough

The following example calls the Account API to get a list of workspaces.

  • Replace <google-id-token> with the Google ID token you saved in file google-id-token-sa-2.txt.
  • Replace <account-id> with your account ID. To find your account ID:
    1. As an account admin, go to the Databricks account console.
    2. Click the down arrow next to your username in the upper right corner.
    3. In the dropdown menu you can copy your Account ID.
Bash
curl \
-X GET \
--header 'Authorization: Bearer <google-id-token>' \
https://accounts.gcp.databricks.com/api/2.0/accounts/<account-id>/workspaces

Example account-level API request with credential passthrough

The following example calls the Account API to get a list of workspaces.

  • Replace <google-id-token> with the Google ID token you saved in file google-id-token-sa-2.txt.

  • Replace <access-token-sa-2> with the SA-2 access token that you saved in file access-token-sa-2.txt.

  • Replace <account-id> with your account ID. To find your account ID:

    1. As an account admin, go to the Databricks account console.
    2. In the upper right, click the user profile icon.
    3. In the popup that appears, copy the account ID by clicking the icon to the right of the ID.

    Find your account ID.

Bash
curl \
-X GET \
--header 'Authorization: Bearer <google-id-token>' \
--header 'X-Databricks-GCP-SA-Access-Token: <access-token-sa-2>' \
https://accounts.gcp.databricks.com/api/2.0/accounts/<account-id>/workspaces