Configure Git credentials & connect a remote repo to Databricks

This article describes how to configure your Git credentials in Databricks so that you can connect a remote repo using Databricks Git folders (formerly Repos).

For a list of supported Git providers (cloud and on-premises), read Supported Git providers.

GitHub and GitHub AE

The following information applies to GitHub and GitHub AE users.

Why use the Databricks GitHub App instead of a PAT?

Databricks Git folders allows you to choose the Databricks GitHub App for user authentication instead of PATs if you are using a hosted GitHub account. Using the GitHub App provides the following benefits over PATs:

  • It uses OAuth 2.0 for user authentication. OAuth 2.0 repo traffic is encrypted for strong security.

  • It is easier to integrate (see the steps below) and does not require individual tracking of tokens.

  • Token renewal is handled automatically.

  • The integration can be scoped to specific attached Git repos, allowing you more granular control over access.

Important

As per standard OAuth 2.0 integration, Databricks stores a user’s access and refresh tokens–all other access control is handled by GitHub. Access and refresh tokens follow GitHub’s default expiry rules with access tokens expiring after 8 hours (which minimizes risk in the event of credential leak). Refresh tokens have a 6-month lifetime if unused. Linked credentials expire after 6 months of inactivity, requiring the user to reconfigure them.

You can optionally encrypt Databricks tokens using customer-managed keys (CMK).

Connect to a GitHub repo using a personal access token

In GitHub, follow these steps to create a personal access token that allows access to your repositories:

  1. In the upper-right corner of any page, click your profile photo, then click Settings.

  2. Click Developer settings.

  3. Click the Personal access tokens tab.

  4. Click the Generate new token button.

  5. Enter a token description.

  6. Select the repo scope and workflow scope, and click the Generate token button. workflow scope is needed in case your repository has GitHub Action workflows.

  7. Copy the token to your clipboard. You enter this token in Databricks under User Settings > Linked accounts.

To use single sign-on, see Authorizing a personal access token for use with SAML single sign-on.

Connect to a GitHub repo using a fine-grained personal access token

In GitHub, follow these steps to create a fine-grained PAT that allows access to your repositories:

  1. In the upper-right corner of any page, click your profile photo, then click Settings.

  2. Click Developer settings.

  3. Click the Fine-grained tokens tab in the left-hand pane.

  4. Click the Generate new token button in the upper-right of the page to open the New fine-grained personal access token page.

    Generate GitHub token
  5. Configure your new fine-grained token from the following settings:

    • Token name: Provide a unique token name. Write it down somewhere so you don’t forget or lose it!

    • Expiration: Select the time period for token expiry. The default is “30 days”.

    • Description: Add some short text describing the purpose of the token.

    • Resource owner: The default is your current GitHub ID. You can also set it to another account ID or to a GitHub organization.

    • Under Repository access, choose the access scope for your token. As a best practice, select only those repositories that you will be using for Git folder version control.

    • Under Permissions, configure the specific access levels granted by this token for the repositories and account you will work with. For more details on the permission groups, read Permissions required for fine-grained personal access tokens in the GitHub documentation.

  6. Click the Generate token button.

  7. Copy the token to your clipboard. You enter this token in Databricks under User Settings > Linked accounts.

GitLab

In GitLab, follow these steps to create a personal access token that allows access to your repositories:

  1. From GitLab, click your user icon in the upper-left corner of the screen and select Preferences.

  2. Click Access Tokens in the sidebar.

  3. Click Add new token in the Personal Access Tokens section of the page.

    Generate GitLab token
  4. Enter a name for the token.

  5. Select the specific scopes to provide access by checking the boxes for your desired permission levels. For more details on the scope options, read the GitLab documentation on PAT scopes.

  6. Click Create personal access token.

  7. Copy the token to your clipboard. Enter this token in Databricks under User Settings > Linked accounts.

See the GitLab documentation to learn more about how to create and manage personal access tokens.

GitLab also provides support for fine-grained access using “Project Access Tokens”. You can use Project Access Tokens to scope access to a GitLab project. For more details, read GitLab’s documentation on Project Access Tokens.

AWS CodeCommit

In AWS CodeCommit, follow these steps to create a HTTPS Git credential that allows access to your repositories:

  1. In AWS CodeCommit, create HTTPS Git credentials that allow access to your repositories. See the AWS CodeCommit documentation. The associated IAM user must have “read” and “write” permissions for the repository.

  2. Record the password. You enter this password in Databricks under User Settings > Linked accounts.

Azure DevOps Services

Connect to an Azure DevOps repo using a token

The following steps show you how to connect a Databricks repo to an Azure DevOps repo when they aren’t in the same Microsoft Entra ID tenancy.

The service endpoint for Microsoft Entra ID must be accessible from both the private and public subnets of the Databricks workspace. For more information, see VPC peering.

Get an access token for the repository in Azure DevOps:

  1. Go to dev.azure.com, and then sign in to the DevOps organization containing the repository you want to connect Databricks to.

  2. In the upper-right side, click the User Settings icon and select Personal Access Tokens.

  3. Click + New Token.

  4. Enter information into the form:

    1. Name the token.

    2. Select the organization name, which is the repo name.

    3. Set an expiration date.

    4. Choose the the scope required, such as Full access.

  5. Copy the access token displayed.

  6. Enter this token in Databricks under User Settings > Linked accounts.

  7. In Git provider username or email, enter the email address you use to log in to the DevOps organization.

In Azure DevOps, follow these steps to get an access token for the repository. Azure DevOps documentation contains more information about Azure DevOps personal access tokens.

  1. Go to dev.azure.com, and then sign in to the DevOps organization containing the repository you want to connect Databricks to.

  2. In the upper-right side, click the User Settings icon and select Personal Access Tokens.

  3. Click + New Token.

  4. Enter information into the form:

    1. Name the token.

    2. Select the organization name, which is the repo name.

    3. Set an expiration date.

    4. Choose the the scope required, such as Full access.

  5. Copy the access token displayed.

  6. Enter this token in Databricks under User Settings > Linked accounts.

  7. In Git provider username or email, enter the email address you use to log in to the DevOps organization.

Bitbucket

Note

Databricks does not support Bitbucket Repository Access Tokens or Project Access Tokens.

In Bitbucket, follow these steps to create an app password that allows access to your repositories:

  1. Go to Bitbucket Cloud and create an app password that allows access to your repositories. See the Bitbucket Cloud documentation.

  2. Record the password in a secure manner.

  3. In Databricks, enter this password under User Settings > Linked accounts.

Other Git providers

If your Git provider is not listed, selecting “GitHub” and providing it the PAT you obtained from your Git provider often works, but is not guaranteed to work.