GitHub connector
This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.
The managed GitHub connector in Lakeflow Connect allows you to ingest data from GitHub into Databricks.
What to know before you start
Topic | Why it matters |
|---|---|
The workflow depends on your Databricks user persona:
| |
The steps to create a connection depend on the authentication method you choose. | |
The steps to create a pipeline depend on the interface. | |
The pipeline schedule depends on your latency and cost requirements. | |
Depending on your ingestion needs, the pipeline might use configurations like history tracking, column selection, and row filtering. Supported configurations vary by connector. See Feature availability. |
Start ingesting from GitHub
The following table summarizes the end-to-end GitHub ingestion flow, based on user type:
User | Steps |
|---|---|
Admin |
|
Non-admin | Use any supported interface to create a pipeline from an existing connection. See Ingest data from GitHub. |
Feature availability
Feature | Availability |
|---|---|
UI-based pipeline authoring |
|
API-based pipeline authoring |
|
Declarative Automation Bundles |
|
Incremental ingestion |
Some tables support incremental ingestion. Other tables require a full refresh. See Supported data. |
Unity Catalog governance |
|
Lakeflow Jobs |
|
SCD type 2 |
|
Column selection and deselection |
|
API-based row filtering |
|
Automated schema evolution: New and deleted columns |
|
Automated schema evolution: Data type changes |
|
Automated schema evolution: Column renames |
|
Automated schema evolution: New tables |
|
Authentication methods
Authentication method | Availability |
|---|---|
OAuth U2M |
|
OAuth M2M |
|
OAuth (manual refresh token) |
|
Basic authentication (username/password) |
|
Basic authentication (API key) |
|
Basic authentication (service account JSON key) |
|