Set up Git integration with Databricks Repos

Set up your Databricks workspace and a Git repo to use Databricks Repos capabilities. Once you set up Databricks Repos, you can run notebooks or access project files and libraries stored in a remote Git repo.

Note

  • Databricks recommends that you set an expiration date for all personal access tokens.

  • If you are using GitHub AE and you have enabled GitHub allow lists, you must add Databricks control plane NAT IPs to the allow list. Use the IP for the region that the Databricks workspace is in.

Configure user settings for Git integration

  1. Click User Settings Icon Settings in your Databricks workspace and select User Settings from the menu.

  2. On the User Settings page, go to the Git Integration tab.

  3. Follow the instructions for integration with:

    For Azure DevOps, Git integration does not support Azure Active Directory tokens. You must use an Azure DevOps personal access token.

  4. If your organization has SAML SSO enabled in GitHub, ensure that you have authorized your personal access token for SSO.

To learn more service principals on Databricks, see Service principals for Databricks automation. For information about service principals and CI/CD, see Service principals for CI/CD.

Use a service principal and Databricks Repos

For CI/CD workflows, you can set up:

  • A service principal with Databricks.

  • Git repository credentials for automation.

To set up service principals and then add Git provider credentials:

To call these three APIs, you can use tools such as curl, Postman, or Terraform. You cannot use the Databricks user interface.

Enable support for arbitrary files in Databricks Repos

To work with non-notebook files in Databricks Repos, you must be running Databricks Runtime 8.4 or above. If you are running Databricks Runtime 11.0 or above, support for arbitrary files is enabled by default.

Preview

This feature is in Public Preview.

In addition to syncing notebooks with a remote Git repository, you can sync any type of file your project requires, such as:

  • .py files

  • data files in .csv or .json format

  • .yaml configuration files

You can import and read these files within a Databricks repo. You can also view and edit plain text files in the UI.

If support for this feature is not enabled, you still see non-notebook files in your repo, but you cannot work with them.

Enable Files in Repos

An admin can enable this feature as follows:

  1. Go to the Admin Console.

  2. Click the Workspace Settings tab.

  3. In the Repos section, click the Files in Repos toggle.

After the feature has been enabled, you must restart your cluster and refresh your browser before you can non-noteboook files in Repos.

Additionally, the first time you access a repo after Files in Repos is enabled, you must open the Git dialog. The dialog indicates that you must perform a pull operation to sync non-notebook files in the repo. Select Agree and Pull to sync files. If there are any merge conflicts, another dialog appears giving you the option of discarding your conflicting changes or pushing your changes to a new branch.

Confirm Files in Repos is enabled

You can use the command %sh pwd in a notebook inside a Repo to check if Files in Repos is enabled.

  • If Files in Repos is not enabled, the response is /databricks/driver.

  • If Files in Repos is enabled, the response is /Workspace/Repos/<path to notebook directory> .

Control access to Databricks Repos

Manage permissions

When you create a repo, you have Can Manage permission. This lets you perform Git operations or modify the remote repository. You can clone public remote repositories without Git credentials (personal access token and username). To modify a public remote repository, or to clone or modify a private remote repository, you must have a Git provider username and personal access token with read and write permissions for the remote repository.

Use allow lists

An admin can limit which remote repositories users can clone, commit, and push to. This helps provide security for code projects and helps prevent usage of unlicensed code.

  1. Go to the Admin Console.

  2. Click the Workspace Settings tab.

  3. In the Repos section, choose an option from Repos Git Allow List:

    • Disabled (no restrictions): There are no checks against the allow list.

    • Restrict clone, commit & push to allowed Git repositories: Clone, commit, and push operations are allowed only for repository URLs in the allow list.

    • Only restrict commit & push to allowed Git repositories: Commit and push operations are allowed only for repository URLs in the allow list. Clone and pull operations are not restricted.

  4. In the field next to Repos Git URL Allow List: Empty list, enter a comma-separated list of URL prefixes.

  5. Click Save.

To allow access to all repositories, choose Disable (no restrictions).

Note

  • The list you save overwrites the existing set of saved URL prefixes.

  • It may take about 15 minutes for changes to take effect.

Secrets detection

Databricks Repos scans code for access key IDs that begin with the prefix AKIA and warns the user before committing.

Terraform integration

You can manage Databricks Repos in a fully automated setup using Databricks Terraform provider and databricks_repo:

resource "databricks_repo" "this" {
  url = "https://github.com/user/demo.git"
}