GitHub Actions
This feature is in Public Preview.
GitHub Actions allows you to automate your build, test, and deployment CI/CD pipeline. Databricks has developed the following GitHub Actions for your CI/CD workflows on GitHub. Add GitHub Actions YAML files to your repo's .github/workflows
directory.
This article covers GitHub Actions, which is developed by a third party. To contact the provider, see GitHub Actions Support.
GitHub Action | Description |
---|---|
A composite action that sets up the Databricks CLI in a GitHub Actions workflow. |
Deploy bundles with GitHub Actions
GitHub Actions can be used to deploy your Databricks Asset Bundles and trigger runs of your CI/CD workflows from your GitHub repositories. For information about other CI/CD features and best practices on Databricks, see CI/CD on Databricks and Best practices and recommended CI/CD workflows on Databricks.
Run a CI/CD workflow with a bundle that runs a pipeline update
The following example GitHub Actions YAML file triggers a test deployment that validates, deploys, and runs the specified job in the bundle within a pre-production target named “dev” as defined within a bundle configuration file.
This example requires that there is:
- A bundle configuration file at the root of the repository, which is explicitly declared through the GitHub Actions YAML file's setting
working-directory: .
This bundle configuration file should define a Databricks workflow namedmy-job
and a target nameddev
. See Databricks Asset Bundle configuration. - A GitHub secret named
SP_TOKEN
, representing the Databricks access token for a Databricks service principal that is associated with the Databricks workspace to which this bundle is being deployed and run. See Encrypted secrets.
# This workflow validates, deploys, and runs the specified bundle
# within a pre-production target named "dev".
name: 'Dev deployment'
# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1
# Trigger this workflow whenever a pull request is opened against the repo's
# main branch or an existing pull request's head branch is updated.
on:
pull_request:
types:
- opened
- synchronize
branches:
- main
jobs:
# Used by the "pipeline_update" job to deploy the bundle.
# Bundle validation is automatically performed as part of this deployment.
# If validation fails, this workflow fails.
deploy:
name: 'Deploy bundle'
runs-on: ubuntu-latest
steps:
# Check out this repo, so that this workflow can access it.
- uses: actions/checkout@v3
# Download the Databricks CLI.
# See https://github.com/databricks/setup-cli
- uses: databricks/setup-cli@main
# Deploy the bundle to the "dev" target as defined
# in the bundle's settings file.
- run: databricks bundle deploy
working-directory: .
env:
DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
DATABRICKS_BUNDLE_ENV: dev
# Validate, deploy, and then run the bundle.
pipeline_update:
name: 'Run pipeline update'
runs-on: ubuntu-latest
# Run the "deploy" job first.
needs:
- deploy
steps:
# Check out this repo, so that this workflow can access it.
- uses: actions/checkout@v3
# Use the downloaded Databricks CLI.
- uses: databricks/setup-cli@main
# Run the Databricks workflow named "my-job" as defined in the
# bundle that was just deployed.
- run: databricks bundle run my-job --refresh-all
working-directory: .
env:
DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
DATABRICKS_BUNDLE_ENV: dev
You may also want to trigger production deployments. The following GitHub Actions YAML file can exist in the same repo as the preceding file. This file validates, deploys, and runs the specified bundle within a production target named “prod” as defined within a bundle configuration file.
# This workflow validates, deploys, and runs the specified bundle
# within a production target named "prod".
name: 'Production deployment'
# Ensure that only a single job or workflow using the same concurrency group
# runs at a time.
concurrency: 1
# Trigger this workflow whenever a pull request is pushed to the repo's
# main branch.
on:
push:
branches:
- main
jobs:
deploy:
name: 'Deploy bundle'
runs-on: ubuntu-latest
steps:
# Check out this repo, so that this workflow can access it.
- uses: actions/checkout@v3
# Download the Databricks CLI.
# See https://github.com/databricks/setup-cli
- uses: databricks/setup-cli@main
# Deploy the bundle to the "prod" target as defined
# in the bundle's settings file.
- run: databricks bundle deploy
working-directory: .
env:
DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
DATABRICKS_BUNDLE_ENV: prod
# Validate, deploy, and then run the bundle.
pipeline_update:
name: 'Run pipeline update'
runs-on: ubuntu-latest
# Run the "deploy" job first.
needs:
- deploy
steps:
# Check out this repo, so that this workflow can access it.
- uses: actions/checkout@v3
# Use the downloaded Databricks CLI.
- uses: databricks/setup-cli@main
# Run the Databricks workflow named "my-job" as defined in the
# bundle that was just deployed.
- run: databricks bundle run my-job --refresh-all
working-directory: .
env:
DATABRICKS_TOKEN: ${{ secrets.SP_TOKEN }}
DATABRICKS_BUNDLE_ENV: prod
Run a CI/CD workflow that builds a JAR and deploys a bundle
If you have a Java-based ecosystem, your GitHub Action will need to build and upload a JAR before deploying the bundle. The following example GitHub Actions YAML file triggers a deployment that builds and uploads a JAR to a volume, then validates and deploys the bundle to a production target named "prod" as defined within the bundle configuration file. It compiles a Java-based JAR, but the compilation steps for a Scala-based project are similar.
This example requires that there is:
- A bundle configuration file at the root of the repository, which is explicitly declared through the GitHub Actions YAML file's setting
working-directory: .
- A
DATABRICKS_TOKEN
environment variable that represents the Databricks access token that is associated with the Databricks workspace to which this bundle is being deployed and run. - A
DATABRICKS_HOST
environment variable that represents the Databricks host workspace.
name: Build JAR and deploy with bundles
on:
pull_request:
branches:
- main
push:
branches:
- main
jobs:
build-test-upload:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Java
uses: actions/setup-java@v4
with:
java-version: '17' # Specify the Java version used by your project
distribution: 'temurin' # Use a reliable JDK distribution
- name: Cache Maven dependencies
uses: actions/cache@v4
with:
path: ~/.m2/repository
key: ${{ runner.os }}-maven-${{ hashFiles('**/pom.xml') }}
restore-keys: |
${{ runner.os }}-maven-
- name: Build and test JAR with Maven
run: mvn clean verify # Use verify to ensure tests are run
- name: Databricks CLI Setup
uses: databricks/setup-cli@v0.9.0 # Pin to a specific version
- name: Upload JAR to a volume
env:
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} # Add host for clarity
run: |
databricks fs cp target/my-app-1.0.jar dbfs:/Volumes/artifacts/my-app-${{ github.sha }}.jar --overwrite
validate:
needs: build-test-upload
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Databricks CLI Setup
uses: databricks/setup-cli@v0.9.0
- name: Validate bundle
env:
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
run: databricks bundle validate
deploy:
needs: validate
if: github.event_name == 'push' && github.ref == 'refs/heads/main' # Only deploy on push to main
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Databricks CLI Setup
uses: databricks/setup-cli@v0.9.0
- name: Deploy bundle
env:
DATABRICKS_TOKEN: ${{ secrets.DATABRICKS_TOKEN }}
DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }}
run: databricks bundle deploy --target prod