Git integration with Databricks Repos

Databricks Repos is a visual Git client and API in Databricks. It supports common Git operations such a cloning a repository, committing and pushing, pulling, branch management, and visual comparison of diffs when committing.

Within Repos you can develop code in notebooks or other files and follow data science and engineering code development best practices using Git for version control, collaboration, and CI/CD.

What can you do with Databricks Repos?

Databricks Repos provides source control for data and AI projects by integrating with Git providers.

In Databricks Repos, you can use Git functionality to:

  • Clone, push to, and pull from a remote Git repository.

  • Create and manage branches for development work, including merging, rebasing, and resolving conflicts.

  • Create notebooks, and edit notebooks and other files.

  • Visually compare differences upon commit.

For step-by-step instructions, see Clone a Git repo & other common Git operations. Databricks Repos also has an API that you can integrate with your CI/CD pipeline. For example, you can programmatically update a Databricks repo so that it always has the most recent version of the code. For information about best practices for code development using Databricks Repos, see CI/CD techniques with Git and Databricks Repos.

Supported Git providers

Databricks supports the following Git providers:

  • GitHub

  • Bitbucket Cloud

  • GitLab

  • Azure DevOps

  • AWS CodeCommit

  • GitHub AE

See Get a Git access token & connect a remote repo to Databricks.

Databricks Repos also supports Bitbucket Server, GitHub Enterprise Server, and GitLab self-managed integration, if the server is internet accessible. To integrate with a private Git server instance that is not internet-accessible, get in touch with your Databricks representative.

Support for arbitrary files in Databricks Repos is available in Databricks Runtime 8.4 and above. See What are workspace files?.