Set up Databricks Git folders (Repos)

Learn how to set up Databricks Git folders (formerly Repos) for version control. Once you set up Git folders in your Databricks, you can perform common Git operations such as clone, checkout, commit, push, pull, and branch management on them from the Databricks UI. You can also see diffs for your changes as you develop with notebooks and files in Databricks.

Configure user settings

Databricks Git folders uses a personal access token (PAT) or an equivalent credential to authenticate with your Git provider to perform operations such as clone, push, pull etc. To use Git folders, you must first add your Git PAT and Git provider username to Databricks. See Configure Git credentials & connect a remote repo to Databricks.

You can clone public remote repositories without Git credentials (a personal access token and a username). To modify a public remote repository or to clone or modify a private remote repository, you must have a Git provider username and PAT with Write (or greater) permissions for the remote repository.

Git folders are enabled by default. For more details on enabling or disabling Git folder support, see Enable or disable the Databricks Git folder feature.

Add or edit Git credentials in Databricks

Important

Databricks Git folders support just one Git credential per user, per workspace.

  1. Select the down arrow next to the account name at the top right of your screen, and then select User Settings.

  2. Select the Linked accounts tab.

  3. If you’re adding credentials for the first time, follow the on-screen instructions.

    If you have previously entered credentials, click Config > Edit and go to the next step.

  4. In the Git provider drop-down, select the provider name.

  5. Enter your Git user name or email.

  6. In the Token field, add a personal access token (PAT) or other credentials from your Git provider. For details, see Configure Git credentials & connect a remote repo to Databricks

    Important

    Databricks recommends that you set an expiration date for all personal access tokens.

    For Azure DevOps, Git integration does not support Microsoft Entra ID (formerly Azure Active Directory) tokens. You must use an Azure DevOps personal access token. See Connect to Azure DevOps project using a DevOps token.

    If your organization has SAML SSO enabled in GitHub, authorize your personal access token for SSO.

  7. Enter your username in the Git provider username field.

  8. Click Save.

You can also save a Git PAT token and username to Databricks using the Databricks Repos API.

Network connectivity between Databricks Git folders and a Git provider

Git folders need network connectivity to your Git provider to function. Ordinarily, this is over the internet and works out of the box. However, you might have set up additional restrictions on your Git provider for controlling access. For example, you might have a IP allow list in place, or you might host your own on-premises Git server using services like GitHub Enterprise (GHE), Bitbucket Server ( BBS), or Gitlab Self-managed. Depending on your network hosting and configuration, your Git server might not be accessible via the internet.

Note

Security features in Git folders

Databricks Git folders have many security features. The following sections walk you through their setup and use:

  • Use of encrypted Git credentials

  • An allowlist

  • Workspace access control

  • Audit logging

  • Secrets detection

Bring your own key: Encrypt Git credentials

You can use AWS Key Management Service to encrypt a Git personal access token (PAT) or other Git credential. Using a key from an encryption service is referred to as a customer-managed key (CMK) or bring your own key (BYOK).

For more information, see Customer-managed keys for managed services.

Restrict usage to URLs in an allow list

A workspace admin can limit which remote repositories users can clone from and commit & push to. This helps prevent exfiltration of your code; for example, users cannot push code to an arbitrary repository if you have turned on the allow list restrictions. You can also prevent users from using unlicensed code by restricting clone operation to a list of allowed repositories.

To set up an allow list:

  1. Go to the Admin Settings page.

  2. Click the Workspace admin tab (it is open by default).

  3. In the Development section, choose an option from Git URL allow list permission:

    • Disabled (no restrictions): There are no checks against the allow list.

    • Restrict Clone, Commit & Push to Allowed Git Repositories: Clone, commit, and push operations are allowed only for repository URLs in the allow list.

    • Only Restrict Commit & Push to Allowed Git Repositories: Commit and push operations are allowed only for repository URLs in the allow list. Clone and pull operations are not restricted.

The Development pane under Admin Settings, used to set user Git access
  1. Click the Edit button next to Git URL allow list: Empty list and enter a comma-separated list of URL prefixes.

The Edit allow list button in the Development admin settings
  1. Click Save.

Note

  • The list you save overwrites the existing set of saved URL prefixes.

  • It can take up to 15 minutes for the changes to take effect.

Allow access to all repositories

To disable an existing allow list and allow access to all repositories:

  1. Go to the Admin Settings page.

  2. Click the Workspace admin tab.

  3. In the Development section, under Git URL allow list permission: select Disable (no restrictions).

Control access for a repo in your workspace

Note

Access control is available only in the Premium plan or above.

Set permissions for a repo to control access. Permissions for a repo apply to all content in that repo. You can assign five permission levels to files: NO PERMISSIONS, CAN READ, CAN RUN, CAN EDIT, and CAN MANAGE.

For more details on Git folder permissions, see Git folder ACLs.

(Optional) Set up a proxy for enterprise Git servers

If your company uses an on-premises enterprise Git service, such as GitHub Enterprise or Azure DevOps Server, you can use the Databricks Git Server Proxy to connect your Databricks workspaces to the repos it serves.

Audit logging

When audit logging is enabled, audit events are logged when you interact with a Git folder. For example, an audit event is logged when you create, update, or delete a Git folder, when you list all Git folders associated with a workspace, and when you sync changes between your Git folder and the remote Git repo.

Secrets detection

Git folders scan code for access key IDs that begin with the prefix AKIA and warns the user before committing.

Use a repo config file

You can add settings for each notebook to your repo in a .databricks/commit_outputs file that you create manually.

Specify the notebook you want to include outputs using patterns similar to gitignore patterns.

Patterns for a repo config file

The file contains positive and negative file path patterns. File path patterns include notebook file extension such as .ipynb.

  • Positive patterns enable outputs inclusion for matching notebooks.

  • Negative patterns disable outputs inclusion for matching notebooks.

Patterns are evaluated in order for all notebooks. Invalid paths or paths not resolving to .ipynb notebooks are ignored.

To include outputs from a notebook path folder/innerfolder/notebook.ipynb, use following patterns:

**/*
folder/**
folder/innerfolder/note*

To exclude outputs for a notebook, check that none of the positive patterns match or add a negative pattern in a correct spot of the configuration file. Negative (exclude) patterns start with !:

!folder/innerfolder/*.ipynb
!folder/**/*.ipynb
!**/notebook.ipynb

Move Git folder to trash (delete)

To delete a Git folder from your workspace:

  1. Right-click the Git folder, and then select Move to trash.

  2. In the dialog box, type the name of the Git folder you want to delete. Then, click Confirm & move to trash.

    Confirm Move to Trash dialog box.