Git integration with Databricks Repos

Databricks Repos is a visual Git client and API in Databricks. It supports common Git operations such a cloning a repository, committing and pushing, pulling, branch management, and visual comparison of diffs when committing.

Within Repos you can develop code in notebooks or other files and follow data science and engineering code development best practices using Git for version control, collaboration, and CI/CD.

What can you do with Databricks Repos?

Databricks Repos provides source control for data and AI projects by integrating with Git providers.

In Databricks Repos, you can use Git functionality to:

  • Clone, push to, and pull from a remote Git repository.

  • Create and manage branches for development work, including merging, rebasing, and resolving conflicts.

  • Create notebooks—including IPYNB notebooks—and edit them and other files.

  • Visually compare differences upon commit and resolve merge conflicts.

For step-by-step instructions, see Clone a Git repo & other common Git operations. Databricks Repos also has an API that you can integrate with your CI/CD pipeline. For example, you can programmatically update a Databricks repo so that it always has the most recent version of the code. For information about best practices for code development using Databricks Repos, see CI/CD techniques with Git and Databricks Repos.

For information on the kinds of notebooks supported in Databricks, see Export and import Databricks notebooks.

Supported Git providers

Databricks supports the following Git providers:

  • GitHub

  • Bitbucket Cloud

  • GitLab

  • Azure DevOps

  • AWS CodeCommit

  • GitHub AE

See Configure Git credentials & connect a remote repo to Databricks.

Databricks Repos also supports Bitbucket Server, GitHub Enterprise Server, and GitLab self-managed integration, if the server is internet accessible. To integrate with a private Git server instance that is not internet-accessible, get in touch with your Databricks representative.

Support for arbitrary files in Databricks Repos went live in Databricks Runtime 8.4. For production workloads that support arbitrary files in Databricks Repos and workspace files, use Databricks Runtime 11.3 LTS and above. See What are workspace files?.

Resources for Git integration

Use the Databricks CLI 2.0 for Git integration with Databricks:

Use the following reference docs: