GitHub Version Control¶
This guide describes how to set up version control for notebooks using GitHub.
Using the Databricks CLI and Workspace API¶
Although this document describes how to set up GitHub integration through the UI, we recommend that you integrate with Git through the Databricks CLI and Workspace API. It is a more powerful tool.
Getting an Access Token¶
Go to GitHub and create a personal access token that allows access to your repositories:
From GitHub, access the menu on the upper right, next to your Gravitar, and select Settings.
Go to the Personal access tokens tab.
Click the Generate New Token button to create a new token.
Select the repo and public_repo permissions, and click the Generate Token button.
Copy the token to your clipboard. You will enter this token in Databricks in the next step.
See the GitHub documentation to learn more about how to create personal access tokens.
Saving Your Access Token to Databricks¶
You can save your GitHub personal access token through the Account Settings page. Click the User icon on the top right of your screen and select Account Settings.
Go to the “Git Integration” section.
If you have previously entered credentials, click the “Change token or app password” button.
Paste your token into the Token or app password field, and click Save.
Save a Notebook to GitHub¶
While the changes that you make to your notebook are saved automatically to the history list, changes do not automatically persist to GitHub.
- Click Save Now to save your notebook to GitHub.
- Optionally, enter a message to describe your change.
- Make sure that Also Commit to GitHub is selected.
Revert/Update a Notebook to a version from GitHub¶
Once you link a notebook, Databricks will sync your history with Git every time you re-open the history panel.
Versions that sync to GitHub will have the commit hashes as part of the entry. Click Restore this version to view old/new versions of your notebook from GitHub.
You can work on an arbitrary branch of your repository with Databricks. You can even create new branches inside Databricks.
In order to create a branch:
- In the Revision History panel, click the Git Status bar to open the GitHub panel.
- Click the Branch dropdown.
- Enter your branch name.
- Select the Create Branch option at the bottom of the dropdown. The parent branch is indicated. You will always branch from your current selected branch.
You can also rebase your branch inside Databricks. The
Rebase button displays if new commits are available in the parent branch. Currently we only allow rebasing on top of the default branch of the parent repository.
For example, assume that you are working on
databricks/reference-apps. You fork it into your own account (for example,
brkyvz) and start working on a branch called
my-branch. If a new update gets pushed to
databricks:master, then the
Rebase button displays, and you will be able to pull the changes into your branch
Rebasing works a little differently in Databricks. Assume the following branch structure:
After a rebase, the branch structure will look like:
What’s different here is that Commits C5 and C6 will not apply on top of C4. They will appear as local changes in your notebook. Any merge conflict will show up as follows:
You can then commit to GitHub once again using the Save Now button.
What happens if someone branched off from my branch that I just rebased?
If your branch (for example,
branch-a) was the base for another branch (
branch-b), and you rebase, you need not worry! Once a user also rebases
branch-b, everything will work out. The best practice in this situation is to use separate branches for separate notebooks.
Best Practices for Code Reviews¶
Databricks supports git branching.
- You can link a notebook to your own fork and choose a branch.
- We recommend using separate branches for each notebook.
- Once you are happy with your changes, you can use the
Create PRlink in the
Git Preferencesdialog to take you to GitHub’s Pull Request page.
Create PRlink displays only if you’re not working on the default branch of the parent repository.
Integration with GitHub Enterprise is not officially supported.
If you receive errors related to syncing GitHub history, verify the following:
- You have initialized the repository on GitHub, and it isn’t empty. Try the URL that you entered and verify that it forwards to your GitHub repository.
- Your personal access token is active.
- If the repository is private, you must have at least read level permissions (through GitHub) on the repository.