Skip to main content

Enable workload identity federation for GitLab CI/CD

Databricks OAuth token federation, also known as OpenID Connect (OIDC), allows your automated workloads running outside of Databricks to securely access Databricks without the need for Databricks secrets. See Authenticate access to Databricks using OAuth token federation.

To enable workload identity federation for GitLab CI/CD:

  1. Create a federation policy
  2. Configure the GitLab YAML file

After you enable workload identity federation, the Databricks SDKs and the Databricks CLI automatically fetch workload identity tokens from GitLab CI/CD and exchange them for Databricks OAuth tokens.

Create a federation policy

First, create a workload identity federation policy. For instructions, see Configure a service principal federation policy. For GitLab CI/CD set the following values:

  • Group: The name of your GitLab group. For example, if your project URL is https://gitlab.com/databricks-inc/data-platform, then the group is databricks-inc.
  • Project: The name of the single GitLab project to allow, such as data-platform.
  • Ref type: The kind of Git reference represented in the sub (subject) claim of your token. This can be Branch, Tag, or Merge request.
  • Issuer URL: The GitLab instance URL that issues the OIDC token.
  • Subject: A concatenation of values taken from the job context.
  • Audiences: The expected aud value in the OIDC token. Configure this in your job’s id_tokens: block. Databricks recommends setting it to your Databricks account ID.
  • Subject claim: (Optional) The JWT claim that contains the workload identity (sub) value from the OIDC token. For GitLab, leave the field as sub, which encodes the project, branch, tag, or merge request that triggered the pipeline.

For example, the following Databricks CLI command creates a federation policy for a Databricks service principal numeric ID of 5581763342009999:

Bash
databricks account service-principal-federation-policy create 5581763342009999 --json '{
"oidc_policy": {
"issuer": "https://gitlab.com/example-group",
"audiences": [
"https://gitlab.com/example-group"
],
"subject": "project_path:my-group/my-project:..."
}
}'

Configure the GitLab YAML file

Next, modify the GitLab configuration file. Change <databricks-account-id> to your Databricks account ID.

In addition to setting the following workspace environment variables, store the token in the DATABRICKS_OIDC_TOKEN Databricks environment variable. Alternatively, use a custom environment variable and set DATABRICKS_OIDC_TOKEN_ENV.

  • DATABRICKS_AUTH_TYPE: env-oidc
  • DATABRICKS_HOST: Your Databricks workspace URL
  • DATABRICKS_CLIENT_ID: The service principal (application) ID
YAML
spec:
inputs:
# Specify your Databricks account ID, workspace hostname, and service principal OAuth client ID.
databricks-account-id:
databricks-host:
databricks-client-id:
# See https://docs.gitlab.com/ci/inputs/#define-input-parameters-with-specinputs for more on pipeline input variables.
---
stages:
- my_script_using_wif

variables:
DATABRICKS_AUTH_TYPE: env-oidc
DATABRICKS_HOST: $[[ inputs.databricks-host ]]
DATABRICKS_CLIENT_ID: $[[ inputs.databricks-client-id ]]

my_script_using_wif:
id_tokens:
DATABRICKS_OIDC_TOKEN:
aud: $[[ inputs.databricks-account-id ]]
stage: my_script_using_wif
image: ubuntu:latest
before_script:
- apt-get update -y
- apt-get install -y curl unzip
- curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
script:
- databricks current-user me