GitHub connector

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

The managed GitHub connector in Lakeflow Connect allows you to ingest data from GitHub into Databricks.

Feature availability

Feature	Availability
UI-based pipeline authoring	Supported
API-based pipeline authoring	Supported
Declarative Automation Bundles	Supported
Incremental ingestion	Partially supported Some tables support incremental ingestion. Other tables require a full refresh. See Supported data.
Unity Catalog governance	Supported
Lakeflow Jobs	Supported
SCD type 2	Supported
Column selection and deselection	Supported
API-based row filtering	Not supported
Automated schema evolution: New and deleted columns	Not supported
Automated schema evolution: Data type changes	Not supported
Automated schema evolution: Column renames	Not supported
Automated schema evolution: New tables	Not supported

Feature	Availability
UI-based pipeline authoring	Supported
API-based pipeline authoring	Supported
Declarative Automation Bundles	Supported
Incremental ingestion	Partially supported Some tables support incremental ingestion. Other tables require a full refresh. See Supported data.
Unity Catalog governance	Supported
Lakeflow Jobs	Supported
SCD type 2	Supported
Column selection and deselection	Supported
API-based row filtering	Not supported
Automated schema evolution: New and deleted columns	Not supported
Automated schema evolution: Data type changes	Not supported
Automated schema evolution: Column renames	Not supported
Automated schema evolution: New tables	Not supported

Authentication method	Availability
OAuth U2M	Supported
OAuth M2M	Not supported
OAuth (manual refresh token)	Not supported
Basic authentication (username/password)	Not supported
Basic authentication (API key)	Not supported
Basic authentication (service account JSON key)	Not supported

Authentication method	Availability
OAuth U2M	Supported
OAuth M2M	Not supported
OAuth (manual refresh token)	Not supported
Basic authentication (username/password)	Not supported
Basic authentication (API key)	Not supported
Basic authentication (service account JSON key)	Not supported

Topic	Why it matters
Databricks user persona	The workflow depends on your Databricks user persona: Single-user: An admin user creates a Unity Catalog connection and an ingestion pipeline. Multi-user: An admin user creates a connection for non-admin users to create pipelines with.
Authentication method	The steps to create a connection depend on the authentication method you choose.
Interface	The steps to create a pipeline depend on the interface.
Ingestion frequency	The pipeline schedule depends on your latency and cost requirements.
Common patterns	Depending on your ingestion needs, the pipeline might use configurations like history tracking, column selection, and row filtering. Supported configurations vary by connector. See Feature availability.

Topic	Why it matters
Databricks user persona	The workflow depends on your Databricks user persona: Single-user: An admin user creates a Unity Catalog connection and an ingestion pipeline. Multi-user: An admin user creates a connection for non-admin users to create pipelines with.
Authentication method	The steps to create a connection depend on the authentication method you choose.
Interface	The steps to create a pipeline depend on the interface.
Ingestion frequency	The pipeline schedule depends on your latency and cost requirements.
Common patterns	Depending on your ingestion needs, the pipeline might use configurations like history tracking, column selection, and row filtering. Supported configurations vary by connector. See Feature availability.

The following table summarizes the end-to-end GitHub ingestion flow, based on user type:

User	Steps
Admin	Configure GitHub to enable authentication from Databricks. See Configure OAuth U2M for GitHub ingestion. Either: Use Catalog Explorer to create a connection to GitHub so that non-admins can create pipelines. See Create a GitHub connection. Use the data ingestion UI to create a connection and a pipeline at the same time. See Ingest data from GitHub.
Non-admin	Use any supported interface to create a pipeline from an existing connection. See Ingest data from GitHub.

User	Steps
Admin	Configure GitHub to enable authentication from Databricks. See Configure OAuth U2M for GitHub ingestion. Either: Use Catalog Explorer to create a connection to GitHub so that non-admins can create pipelines. See Create a GitHub connection. Use the data ingestion UI to create a connection and a pipeline at the same time. See Ingest data from GitHub.
Non-admin	Use any supported interface to create a pipeline from an existing connection. See Ingest data from GitHub.