CI/CD workflows with Git integration and Databricks Repos
Learn techniques for using Databricks Repos in CI/CD workflows. Integrating Git repos with Databricks Repos provides source control for project files.
The following figure shows an overview of the techniques and workflow.

Development flow
Databricks Repos have user-level folders and non-user top level folders. User-level folders are automatically created when users first clone a remote repository. You can think of Databricks Repos in user folders as “local checkouts” that are individual for each user and where users make changes to their code.
In your user folder in Databricks Repos, clone your remote repository. A best practice is to create a new feature branch or select a previously created branch for your work, instead of directly committing and pushing changes to the main branch. You can make changes, commit, and push changes in that branch. When you are ready to merge your code, create a pull request and then follow the review and merge processes in your Git provider.
Requirements
This workflow requires that you have already set up your Git integration.
Note
Databricks recommends that each developer work on their own feature branch. Sharing feature branches among developers can cause merge conflicts, which must be resolved using your Git provider. For information about how to resolve merge conflicts, see Resolve merge conflicts.
Collaborate in Repos
Clone your existing Git repository to your Databricks workspace.
Use the Repos UI to create a feature branch from the main branch. This example uses a single feature branch feature-b for simplicity. You can create and use multiple feature branches to do your work.
Make your modifications to Databricks notebooks and other files in the Repo.
Coworkers can now clone the Git repository into their own user folder.
Working on a new branch, a coworker makes changes to the notebooks and other files in the Repo.
The coworker commits and pushes their changes to the Git provider.
To merge changes from other branches or rebase the feature branch, you must use the Git command line or an IDE on your local system. Then, in the Repos UI, use the Git dialog to pull changes into the feature-b branch in the Databricks Repo.
When you are ready to merge your work to the main branch, use your Git provider to create a PR to merge the changes from feature-b.
In the Repos UI, pull changes to the main branch.
Production job workflow
Databricks Repos provides two options for running your production jobs:
Option 1: Provide a remote Git ref in the job definition, for example, a specific notebook in main branch of a Github repository.
Option 2: Set up a production repo and use Repos APIs to update it programmatically. Then run jobs against this Databricks repo.
Option 1: Run jobs using notebooks in a remote repo
Simplify the job definition process and keep a single source of truth by running a Databricks job using notebooks located in a remote Git repository. This Git reference can be a git commit, tag, or branch and is provided by you in the job definition.
This ensures that you can prevent unintentional changes to your production job, for example, when a user makes local edits in a production repo or switches branches. It also automates the CD step as you do not need to create a separate production repo in Databricks, manage permissions for it, and keep it updated.
Option 2: Set up a production repo and Git automation
In this option, you set up a production repo and Git automation to update Databricks Repos on merge.
Step 1: Set up top-level folders
The admin creates non-user top-level folders. The most common use case for these top-level folders is to create development, staging, and production folders that contain Databricks Repos for the appropriate versions or branches for development, staging, and production. For example, if your company uses the Main branch for production, the production folder would contain a Repo that is checked out to the Main branch.
Typically permissions on these top-level folders are read-only for all non-admin users within the workspace. For such top-level folders we recommend you only provide service principal(s) with Can Edit and Can Manage permissions to avoid accidental edits to your production code by workspace users.

Step 2: Set up automated updates to Databricks Repos via the Repos API
In this step, use the Repos API to set up automation to update Databricks Repos upon a merge event.
To ensure that Databricks Repos are always at the latest version, you can set up Git automation to call the _. In your Git provider, set up automation that—after every successful merge of a PR into the main branch—calls the Repos API endpoint on the appropriate repo in the Production folder to pull the changes and update that repo to the latest version.
For example, on GitHub this can be achieved with GitHub Actions.
Run jobs using a notebook in a Databricks Repo
You can point a job directly to a notebook in a Databricks Repo. When a job kicks off a run, it uses the current version of the code in the repo.
If the automation is setup as described in Option 2: Set up a production repo and Git automation, every successful merge calls the Repos API to update the repo. As a result, jobs that are configured to run code from a repo always use the latest version available when the job was run.
Use a service principal with Databricks Repos
To execute the above mentioned workflows with service principals:
Create a service principal with Databricks.
Add the git credentials: Your Git provider PAT the service principal.
To set up service principals and then add Git provider credentials:
Create a Databricks service principal in your workspace with the SCIM API 2.0 (ServicePrincipals) for workspaces.
Create a Databricks access token for a Databricks service principal with the Token Management API 2.0.
Add your Git provider credentials to your workspace with your Databricks access token and the Git Credentials API 2.0.
To call these three APIs, you can use tools such as curl
, Postman, or Terraform. You cannot use the Databricks user interface.
To learn more service principals on Databricks, see Service principals for Databricks automation. For information about service principals and CI/CD, see Service principals for CI/CD.
Terraform integration
You can also manage Databricks Repos in a fully automated setup using Databricks Terraform provider and databricks_repo:
resource "databricks_repo" "this" {
url = "https://github.com/user/demo.git"
}