What is Databricks Git folders

Databricks Git folders is a visual Git client and API in Databricks. It integrates Git repositories within your Databricks workspace and supports common Git operations such as cloning a repository, committing and pushing, pulling, branch management, and visual comparison of diffs when committing.

Within Git folders you can develop code in notebooks or other files and follow data science and engineering code development best practices using Git for version control, collaboration, and CI/CD.

note

Git folders are primarily designed for authoring and collaborative workflows.

What can you do with Databricks Git folders?

Databricks Git folders provides source control for data and AI projects by integrating with Git providers.

In Databricks Git folders, you can use Git functionality from your Databricks workspace to:

Clone, push to, and pull from a remote Git repository.
Create and manage branches for development work, including merging, rebasing, and resolving conflicts.
Create notebooks (including IPYNB notebooks) and edit them and other files.
Visually compare differences upon commit and resolve merge conflicts.

For step-by-step instructions, see Run Git operations on Databricks Git folders (Repos).

note

Databricks Git folders also has an API that you can integrate with your CI/CD pipeline. For example, you can programmatically update a workspace Git folder so that it always has the most recent version of the code. For information about best practices for code development using Databricks Git folders, see CI/CD with Databricks Git folders (Repos).

For information on the kinds of notebooks supported in Databricks, see Import and export Databricks notebooks.

Supported Git providers

Databricks Git folders are backed by an integrated Git repository. The repository can be hosted by any of the cloud and enterprise Git providers listed in the following section.

note

What is a “Git provider”?

A “Git provider” is the specific (named) service that hosts a source control model based on Git. Git-based source control platforms are hosted in two ways: as a cloud service hosted by the developing company, or as an on-premises service installed and managed by your own company on its own hardware. Many Git providers, such as GitHub, Microsoft, GitLab, and Atlassian, provide both cloud-based SaaS and on-premises (sometimes called “self-managed”) Git services.

When choosing your Git provider during configuration, you must be aware of the differences between cloud (SaaS) and on-premises Git providers. On-premises solutions are typically hosted behind a company's VPN and might not be accessible from the internet. Usually, the on-premises Git providers have a name ending in “Server” or “Self-Managed”, but if you are uncertain, contact your company admins or review the Git provider's documentation.

If your Git provider is cloud-based and not listed as a supported provider, selecting “GitHub” as your provider may work, but is not guaranteed.

note

If you are using “GitHub” as a provider and are still uncertain if you are using the cloud or on-premises version, see About GitHub Enterprise Server in the GitHub docs.

Cloud Git providers supported by Databricks

GitHub, GitHub AE, and GitHub Enterprise Cloud
Atlassian BitBucket Cloud
GitLab and GitLab EE
Microsoft Azure DevOps (Azure Repos)

AWS CodeCommit

On-premises Git providers supported by Databricks

GitHub Enterprise Server
Atlassian BitBucket Server and Data Center
GitLab Self-Managed
Microsoft Azure DevOps Server: A workspace admin must explicitly allowlist the URL domain prefixes for your Microsoft Azure DevOps Server if the URL does not match dev.azure.com/* or visualstudio.com/*. For more details, see Restrict usage to URLs in an allowlist

If you are integrating an on-premises Git repo that is not accessible from the internet, a proxy for Git authentication requests must also be installed within your company's VPN. For more details, see Set up private Git connectivity for Databricks Git folders (Repos).

To learn how to use access tokens with your Git provider, see Configure Git credentials & connect a remote repo to Databricks.

What can you do with Databricks Git folders?​

Supported Git providers​

Cloud Git providers supported by Databricks​

On-premises Git providers supported by Databricks​

Next steps​

What can you do with Databricks Git folders?

Supported Git providers

Cloud Git providers supported by Databricks

On-premises Git providers supported by Databricks

Next steps