Use a registered community connector
This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.
This page shows how to use a registered community connector to ingest data from a supported source into Databricks. To create a custom connector for a source that isn't supported yet, see Create a custom connector.
Requirements
- A Databricks workspace with Unity Catalog enabled
- A connection for the source you want to ingest, or permissions to create a connection
- Write access to a catalog and schema for the ingested tables
Create an ingestion pipeline
To use a registered community connector:
-
In the sidebar of your Databricks workspace, click +New > Add or upload data, then select the source under Community connectors.
-
Click + Create connection or select an existing connection, then click Next.
-
For Pipeline name, enter a name for the pipeline.
-
For Event log location, enter a catalog name and a schema name. Databricks stores the pipeline event log here. Ingested tables are also written here by default.
-
For Root path, enter your workspace path (for example,
/Workspace/Users/<your-email>/connectors). Databricks clones and stores the connector source code here. -
Click Create pipeline.
-
In the pipeline editor, open
ingest.pyand update the objects field to include the tables you want to ingest. For example:Pythonfrom databricks.labs.community_connector.pipeline import ingest
pipeline_spec = {
"connection_name": "my_stripe_connection", # Required: UC connection name
"objects": [
{"table": {"source_table": "charges"}},
{"table": {"source_table": "customers",
"destination_table": "stripe_customers"}},
],
}
ingest(spark, pipeline_spec) -
Run the pipeline manually or schedule it.
Pipeline configuration options
You can configure the following options in ingest.py:
Option | Description |
|---|---|
| Required. The name of the connection that stores authentication credentials for the source. |
| Required. A list of tables to ingest. Each entry has the format |
| The catalog where ingested tables are written. Defaults to the catalog set during pipeline creation. |
| The schema where ingested tables are written. Defaults to the schema set during pipeline creation. |
| The slowly changing dimension strategy: |
| Override the default primary keys for a table. Provide a list of column names. |