Databricks Git folders concepts
Databricks Git folders is a visual Git client and API that integrates Git repositories within your workspace. Use Git folders to develop code in notebooks and files while following software development best practices using Git for version control, collaboration, and CI/CD. Git folders supports common Git operations such as cloning a repository, committing and pushing, pulling, branch management, and visually comparing diffs when committing.
This page covers:
Git folders capabilities
Databricks Git folders provide source control for data and AI projects by integrating with Git providers.
Use Git functionality from your Databricks workspace to:
- Clone, push to, and pull from a remote Git repository.
- Create and manage branches for development work, including merging, rebasing, and resolving conflicts.
- Create notebooks, including IPYNB notebooks, and edit them and other files.
- Visually compare differences upon commit and resolve merge conflicts.
For step-by-step instructions, see Run Git operations on Databricks Git folders.
Git folders API
Databricks Git folders have an API to integrate with your CI/CD pipeline. For example, programmatically update a workspace Git folder so that it always has the most recent version of the code. For information about best practices for code development using Databricks Git folders, see CI/CD with Databricks Git folders.
Git providers
A Git provider is a service that hosts a Git-based source control system. These platforms come in two main forms: a cloud service hosted by the vendor, or an on-premises service that your organization installs and manages on its own hardware. Many providers, including GitHub, Microsoft, GitLab, and Atlassian, offer both cloud SaaS and on-premises (often called “self-managed”) options.
Databricks Git folders use an integrated Git repository. Any of the cloud or enterprise Git providers listed in the following sections can host the repository.
When selecting a Git provider during configuration, make sure that you understand the differences between cloud (SaaS) and on-premises systems. Organizations often host self-managed providers behind a VPN, which can make them inaccessible from the public internet. These versions often include “Server” or “Self-Managed” in their names. If you’re unsure which one your organization uses, check your provider’s documentation or ask your company admins.
If your cloud Git provider doesn’t appear in the supported provider list, choosing GitHub might work as a fallback, although this isn’t guaranteed.
If you're using GitHub as a provider and are still uncertain whether you're using the cloud or on-premises version, see About GitHub Enterprise Server in the GitHub docs.
Supported cloud Git providers
Databricks Git folders integrate with the following cloud-based Git providers:
- GitHub, GitHub Advanced Enterprise, and GitHub Enterprise Cloud
- Atlassian Bitbucket Cloud
- GitLab and GitLab Enterprise Edition
- Microsoft Azure DevOps (Azure Repos)
- AWS CodeCommit
Supported on-premises Git providers
Databricks Git folders integrate with the following on-premises Git providers:
- GitHub Enterprise Server
- Atlassian Bitbucket Server and Data Center
- GitLab Self-Managed
- Microsoft Azure DevOps Server: A workspace admin must explicitly allowlist the URL domain prefixes for your Microsoft Azure DevOps Server if the URL doesn't match
dev.azure.com/*orvisualstudio.com/*. See Restrict usage to URLs in an allowlist.
If you're integrating an on-premises Git repo that isn't accessible from the internet, you must also install a proxy for Git authentication requests within your company's VPN. See Set up private Git connectivity for Databricks Git folders (Repos).
To learn how to use access tokens with your Git provider, see Configure Git credentials & connect a remote repo to Databricks.