Skip to main content

GitHub connector

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

The managed GitHub connector in Lakeflow Connect allows you to ingest data from GitHub into Databricks.

What to know before you start

Topic

Why it matters

Databricks user persona

The workflow depends on your Databricks user persona:

  • Single-user: An admin user creates a Unity Catalog connection and an ingestion pipeline.
  • Multi-user: An admin user creates a connection for non-admin users to create pipelines with.

Authentication method

The steps to create a connection depend on the authentication method you choose.

Interface

The steps to create a pipeline depend on the interface.

Ingestion frequency

The pipeline schedule depends on your latency and cost requirements.

Common patterns

Depending on your ingestion needs, the pipeline might use configurations like history tracking, column selection, and row filtering. Supported configurations vary by connector. See Feature availability.

Start ingesting from GitHub

The following table summarizes the end-to-end GitHub ingestion flow, based on user type:

User

Steps

Admin

  1. Configure GitHub to enable authentication from Databricks. See Configure OAuth U2M for GitHub ingestion.
  2. Either:
    • Use Catalog Explorer to create a connection to GitHub so that non-admins can create pipelines. See GitHub.
    • Use the data ingestion UI to create a connection and a pipeline at the same time. See Ingest data from GitHub.

Non-admin

Use any supported interface to create a pipeline from an existing connection. See Ingest data from GitHub.

Feature availability

Feature

Availability

UI-based pipeline authoring

check marked yes Supported

API-based pipeline authoring

check marked yes Supported

Declarative Automation Bundles

check marked yes Supported

Incremental ingestion

check marked yes Partially supported

Some tables support incremental ingestion. Other tables require a full refresh. See Supported data.

Unity Catalog governance

check marked yes Supported

Lakeflow Jobs

check marked yes Supported

SCD type 2

check marked yes Supported

Column selection and deselection

check marked yes Supported

API-based row filtering

x mark no Not supported

Automated schema evolution: New and deleted columns

x mark no Not supported

Automated schema evolution: Data type changes

x mark no Not supported

Automated schema evolution: Column renames

x mark no Not supported

Automated schema evolution: New tables

x mark no Not supported

Authentication methods

Authentication method

Availability

OAuth U2M

check marked yes Supported

OAuth M2M

x mark no Not supported

OAuth (manual refresh token)

x mark no Not supported

Basic authentication (username/password)

x mark no Not supported

Basic authentication (API key)

x mark no Not supported

Basic authentication (service account JSON key)

x mark no Not supported