Enable workload identity federation for GitLab CI/CD
Databricks OAuth token federation, also known as OpenID Connect (OIDC), allows your automated workloads running outside of Databricks to securely access Databricks without the need for Databricks secrets. See Authenticate access to Databricks using OAuth token federation.
To enable workload identity federation for GitLab CI/CD:
After you enable workload identity federation, the Databricks SDKs and the Databricks CLI automatically fetch workload identity tokens from GitLab CI/CD and exchange them for Databricks OAuth tokens.
Create a federation policy
First, create a workload identity federation policy. For instructions, see Configure a service principal federation policy. For GitLab CI/CD set the following values:
- Group: The name of your GitLab group. For example, if your project URL is
https://gitlab.com/databricks-inc/data-platform
, then the group isdatabricks-inc
. - Project: The name of the single GitLab project to allow, such as
data-platform
. - Ref type: The kind of Git reference represented in the
sub
(subject) claim of your token. This can be Branch, Tag, or Merge request. - Issuer URL: The GitLab instance URL that issues the OIDC token.
- Subject: A concatenation of values taken from the job context.
- Audiences: The expected
aud
value in the OIDC token. Configure this in your job’sid_tokens:
block. Databricks recommends setting it to your Databricks account ID. - Subject claim: (Optional) The JWT claim that contains the workload identity (
sub
) value from the OIDC token. For GitLab, leave the field assub
, which encodes the project, branch, tag, or merge request that triggered the pipeline.
For example, the following Databricks CLI command creates a federation policy for a Databricks service principal numeric ID of 5581763342009999
:
databricks account service-principal-federation-policy create 5581763342009999 --json '{
"oidc_policy": {
"issuer": "https://gitlab.com/example-group",
"audiences": [
"https://gitlab.com/example-group"
],
"subject": "project_path:my-group/my-project:..."
}
}'
Configure the GitLab YAML file
Next, modify the GitLab configuration file. Change <databricks-account-id>
to your Databricks account ID.
In addition to setting the following workspace environment variables, store the token in the DATABRICKS_OIDC_TOKEN
Databricks environment variable. Alternatively, use a custom environment variable and set DATABRICKS_OIDC_TOKEN_ENV
.
DATABRICKS_AUTH_TYPE
:env-oidc
DATABRICKS_HOST
: Your Databricks workspace URLDATABRICKS_CLIENT_ID
: The service principal (application) ID
spec:
inputs:
# Specify your Databricks account ID, workspace hostname, and service principal OAuth client ID.
databricks-account-id:
databricks-host:
databricks-client-id:
# See https://docs.gitlab.com/ci/inputs/#define-input-parameters-with-specinputs for more on pipeline input variables.
---
stages:
- my_script_using_wif
variables:
DATABRICKS_AUTH_TYPE: env-oidc
DATABRICKS_HOST: $[[ inputs.databricks-host ]]
DATABRICKS_CLIENT_ID: $[[ inputs.databricks-client-id ]]
my_script_using_wif:
id_tokens:
DATABRICKS_OIDC_TOKEN:
aud: $[[ inputs.databricks-account-id ]]
stage: my_script_using_wif
image: ubuntu:latest
before_script:
- apt-get update -y
- apt-get install -y curl unzip
- curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh
script:
- databricks current-user me