Authenticate to workspace APIs with a Google ID token

Preview

This feature is in Public Preview.

Databricks provides workspace-level REST APIs. To authenticate an API request, you have two options:

  • Databricks personal access token.

  • Open ID Connect (OIDC) token. This feature is provided as a Public Preview.

OpenID Connect (OIDC) tokens are an open standard to support authentication. OIDC 1.0 is a simple identity layer on top of the OAuth 2.0 protocol. It allows applications to verify the identity of users based on authentication that is performed by an OAuth authorization server. Applications can also get basic profile information of users from OIDC tokens. OIDC tokens by default expire after one hour.

Important

Databricks REST APIs support only the Google-issued OIDC tokens, which are commonly known as Google ID tokens. To reduce confusion, the rest of this article uses the term Google ID token not OIDC token.

This article describes the steps to authenticate to Databricks REST APIs using Google ID tokens and Google OAuth access tokens and how to create the required Google Cloud service accounts and generate tokens for these accounts. A single Google ID token can be used for account-level APIs or workspace-level APIs, but cannot be used for both purposes. The steps for setting up tokens for workspace-level and account-level APIs are the same for most steps, and the important differences are called out in the instructions.

For a production environment, Databricks recommends that you use two service accounts to work with Databricks REST APIs.

  • Create one service account (SA-1) to run your workloads.

  • Create another service account (SA-2) to hold permissions to your Databricks and Google Cloud resources.

  • Grant SA-1 permission to impersonate SA-2 to call Databricks REST APIs.

With this impersonation model, one team can manage workload security and another team can manage resource security. Because you only grant the impersonation permissions as needed, this approach offers security and flexibility to your organization.

This article describes in details how to perform these steps for production use. You can adapt these instructions for non-production use and testing using one of the following strategies:

  • Use your Google user account to impersonate SA-2. The user account must have the role roles/iam.serviceAccountTokenCreator.

  • Use one Google Cloud service account for both SA-1 and SA-2.

Step 1: Create two service accounts

  1. Create two new Google Cloud service accounts. Follow the instructions in the Google article Creating a service account. To use the Google Cloud Console, go to the Service Accounts page and choose a Google Cloud project to create it in. The Google Cloud project in which you create these service accounts does not need to match the project that you use for Databricks workspace, nor do the new service accounts need to use the same Google Cloud project as each other.

    • Token-creating service account (SA-1): This service account automates creation of tokens for the main service account. These tokens will be used to call Databricks REST APIs. Google documentation calls this SA-1.

    • Main service account for Databricks REST APIs (SA-2): This service account acts as a principal (the automation user) for Databricks REST APIs and automated workflows. Google documentation calls this SA-2.

    Save the email address for both service accounts for use in later steps.

  2. Create a service account key for your token-creating service account (SA-1) and save it to a local file called SA-1-key.json.

    1. From the Google Cloud Console Service Accounts page, click the email address for SA-1.

    2. Click the KEYS tab.

    3. Click ADD KEY.

    4. Ensure that JSON (the default) is selected.

    5. Click CREATE.

    6. The web page downloads a key file to your browser. Move that file to your local working directory and rename it SA-1-key.json.

    For additional instructions, see the Google article Creating service account keys.

  3. Grant your token-creating service account (SA-1) the Service Account Token Creator Role on your main service account (SA-2). Follow the instructions in the Google article Direct request permissions.

    1. From the Google Cloud Console Service Accounts page, click the email address for SA-2.

    Important

    In Google Cloud Console, be sure to edit your main SA (SA-2), not your token-creating SA (SA-1):

    1. Click PERMISSIONS.

    2. Click GRANT ACCESS.

    3. In the New Principals field, paste the email address for your token-creating SA (SA-1).

    4. In the Role field, choose Service Account Token Creator Role.

    5. Click SAVE.

Step 2: Create a Google ID token

Databricks recommends using the Google Cloud CLI (gcloud) to generate ID tokens to call Databricks REST APIs.

Important

The generated ID token expires in one hour. You must finish all remaining steps within that time. If the token expires before you complete the later steps, such as calling Databricks APIs, you must repeat this step to generate a new Google ID token.

  1. Install the Google Cloud CLI on your machine. See the Google article on installing the gcloud tool.

  2. Generate ID tokens for your main service account by running the following commands.

    • Replace <SA-1-key-json> with the path to your SA-1 key file in JSON format.

    • Replace <SA-2-email> with SA-2’s email address.

    • Replace <workspace-url> with your workspace URL, which has the form https://<numbers>.<digit>.gcp.databricks.com, for example https://999999999992360.0.gcp.databricks.com.

    Run the following commands for use with production systems:

    gcloud auth login --cred-file=<SA-1-key-json>
    
    gcloud auth print-identity-token --impersonate-service-account="<SA-2-email-address>" --include-email --audiences="<workspace-url>"
    

    For non-production use, if you use your user account to impersonate SA-2, use these commands:

    gcloud auth login
    
    gcloud auth print-identity-token --impersonate-service-account=<SA-2-email-address> --audiences="<workspace-url>" --include-email
    

    For non-production use, if you use one service account for both SA-1 and SA-2, use these commands with the service account’s key JSON file:

    gcloud auth login --cred-file=<SA-key-json>
    
    gcloud auth print-identity-token --audiences="<workspace-url>"
    
  3. Save the long line at the end of the output to a file called google-id-token-sa-2.txt.

    It outputs text similar to the following:

    WARNING: This command is using service account impersonation. All API calls will be executed as [<SA-2-email-address>].
    
    eyJhba7s86dfa9s8f6a99das7fa68s7d6...N8s67f6saa78sa8s7dfiLlA
    

Step 3: Add the service account as a workspace user

To authenticate workspace APIs with the Google ID token, use the workspace admin console to add your main service account (SA-2) as if it were a user email address.

  1. As a workspace admin, go to the admin console.

  2. Follow the instructions in Add users to a workspace and use your main service account’s email address when prompted to provide it in the admin console.

  3. Optionally add any group memberships that might be required for your new service account based on which Databricks REST APIs you plan to call and the data objects that you want to use. See Manage groups.

  4. Optionally add Databricks access control settings for that user. See Enable access control.

Step 4: Call a Databricks API

To call a Databricks REST API for a workspace, pass a Google ID token in the Authorization HTTP header with the following syntax:

Authorization: Bearer <google-id-token>

The token you provide must have the following attributes:

The following example calls a workspace-level API to list clusters.

  • Replace <google-id-token> with the Google ID token you saved in file google-id-token-sa-2.txt.

  • Replace <workspace-URL> with your base workspace URL, which has the form similar to https://1234567890123456.7.gcp.databricks.com.

curl \
  -X GET \
  --header 'Authorization: Bearer <google-id-token>' \
  <workspace-URL>/api/2.0/clusters/list