The article describes how to clone a Git repo and perform other common Git operations with Databricks Repos.
If you clone a repo using the CLI through a cluster’s web terminal, the files won’t display in the Databricks UI.
You can also create a new repo in Databricks and add the remote Git repository URL later.
To create a new repo not linked to a remote Git repository, click the Add Repo button. Deselect Create repo by cloning a Git repository, enter a name for the repo, and then click Create Repo.
When you are ready to add the Git repository URL, click the down arrow next to the repo name in the workspace to open the Repo menu, and select Git… to open the Git dialog.
In the Git repo URL field, enter the URL for the remote repository and select your Git provider from the drop-down menu. Click Save.
Click Repos in the sidebar.
Click Add Repo.
In the Add Repo dialog, select Create repo by cloning a Git repository and enter the repository URL. Select your Git provider from the drop-down menu, optionally change the name to use for the Databricks repo, and click Create Repo. The contents of the remote repository are cloned to the Databricks repo.
At this stage you have the option to clone only a subset of your repositories’s directories via the Sparse Checkout mode see sparse checkout. This is especially useful if your repository’s size is beyond the Databricks supported limits
You can access the Git dialog from a notebook or from the Databricks Repos browser.
From a notebook, click the button next to the name of the notebook that identifies the current Git branch.
From the Databricks Repos browser, click the button to the right of the repo name:
You can also click the down arrow next to the repo name, and select Git… from the menu.
To pull changes from the remote Git repository, click in the Git dialog. Notebooks and other files are updated automatically to the latest version in your remote repository.
Git operations that pull in upstream changes clear the notebook state. For more information, see Incoming changes clear the notebook state.
To resolve a merge conflict, you must either discard conflicting changes or commit your changes to a new branch and then merge them into the original feature branch using a pull request.
If there is a merge conflict, the Repos UI shows a notice allowing you to cancel the pull or resolve the conflict. If you select Resolve conflict using PR, a dialog appears that lets you create a new branch and commit your changes to it.
When you click Commit to new branch, a notice appears with a link: Create a pull request to resolve merge conflicts. Click the link to open your Git provider.
In your Git provider, create the PR, resolve the conflicts, and merge the new branch into the original branch.
Return to the Repos UI. Use the Git dialog to pull changes from the Git repository to the original branch.
When you have added new notebooks or files, or made changes to existing notebooks or files, the Git dialog highlights the changes.
Add a required Summary of the changes, and click Commit & Push to push these changes to the remote Git repository.
If you don’t have permission to commit to the default branch, such as
main, create a new branch and use your Git provider interface to create a pull request (PR) to merge it into the default branch.
Results are not included with a notebook commit. All results are cleared before the commit is made.
For instructions on resolving merge conflicts, see Resolve merge conflicts.
You can switch to (checkout) a different branch via the branch dropdown in the Git dialog
This feature is in Public Preview.
In Databricks Repos, you can perform a Git
reset within the Databricks UI. Git reset in Databricks Repos is equivalent to a
git reset --hard operation. Changes made to your local branch are also pushed to remote.
With Git reset you can reset a branch to a known good state. You can use this when the local edits are in conflict with the upstream branch and you don’t mind losing those edits. Read more about git `reset –hard`.
To reset your local branch to the remote branch, follow these steps.
When you reset, you lose all uncommitted changes, staged and unstaged.
Select Reset from the kebab menu.
Select the branch to reset.
In this scenario, you reset your selected branch (for example,
feature_a) to a different branch (for example,
This process also resets the upstream (remote) branch
feature_a to main.
If you have uncommitted changes, an alert warns “Your uncommitted changes will be lost.”
If you reset to another branch, Databricks runs a force push operation that resets the history of your current branch on remote.
Select Reset from the kebab menu.
Select the remote branch you want to reset to. In this example, you reset to the
This feature is in Public Preview.
Sparse checkout is a client side setting which allows you to clone and work with only a subset of the remote repositories’s directories in Databricks. This is especially useful if your repository’s size is beyond the Databricks supported limits.
You can use the Sparse Checkout mode when adding (cloning) a new repo.
In the Add Repo dialog, open Advanced.
Select Sparse checkout mode.
In the Cone patterns box, specify the cone checkout patterns you want. Separate multiple patterns by line breaks.
At this time, you can’t disable sparse checkout for a repo in Databricks.
To understand how cone pattern works in the sparse checkout mode, see the following diagram representing the remote repository structure.
If you select Sparse checkout mode, but do not specify a cone pattern, the default cone pattern is applied. This includes only the files in root and no subdirectories, resulting in a repo structure as following:
Setting the sparse checkout cone pattern as
parent/child/grandchild results in all contents of the
grandchild directory being recursively included. The files immediately in the
/parent/child and root directory are also included. See the directory structure in the following diagram:
You can add multiple patterns separated by line breaks.
Once a repo is created, the sparse checkout cone pattern can be edited from Settings > Advanced > Cone patterns.
Note the following behavior:
Removing a folder from the cone pattern removes it from Databricks if there are no uncommitted changes .
Adding a folder via editing the sparse checkout cone pattern adds it to Databricks without requiring an additional pull.
Sparse checkout patterns cannot be changed to remove a folder when there are uncommitted changes in that folder.
For example, a user edits a file in a folder and does not commit changes. She then tries to change the sparse checkout pattern to not include this folder. In this case, the pattern is accepted, but the actual folder is not deleted. She needs to revert the pattern to include that folder, commit changes, and then reapply the new pattern.
You can’t disable sparse checkout for a repo that was created with Sparse Checkout mode enabled.
You can edit existing files and commit and push them from the Repos interface. When creating new folders of files you should make sure they are included in the cone pattern you had specified for that repo.
Including a new folder outside of the cone pattern results in an error during the commit and push operation. To rectify it, edit the cone pattern to include the new folder you are trying to commit and push.