Use a registered community connector

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

This page shows how to use a registered community connector to ingest data from a supported source into Databricks. To create a custom connector for a source that isn't supported yet, see Create a custom connector.

Requirements

A Databricks workspace with Unity Catalog enabled
A connection for the source you want to ingest, or permissions to create a connection
Write access to a catalog and schema for the ingested tables

Create an ingestion pipeline

To use a registered community connector:

In the sidebar of your Databricks workspace, click +New > Add or upload data, then select the source under Community connectors.
Click + Create connection or select an existing connection, then click Next.
For Pipeline name, enter a name for the pipeline.
For Event log location, enter a catalog name and a schema name. Databricks stores the pipeline event log here. Ingested tables are also written here by default.
For Root path, enter your workspace path (for example, /Workspace/Users/<your-email>/connectors). Databricks clones and stores the connector source code here.
Click Create pipeline.

In the pipeline editor, open ingest.py and update the objects field to include the tables you want to ingest. For example:

Python
from databricks.labs.community_connector.pipeline import ingest

pipeline_spec = {
    "connection_name": "my_stripe_connection",  # Required: UC connection name
    "objects": [
        {"table": {"source_table": "charges"}},
        {"table": {"source_table": "customers",
                   "destination_table": "stripe_customers"}},
    ],
}

ingest(spark, pipeline_spec)

Run the pipeline manually or schedule it.

Pipeline configuration options

You can configure the following options in ingest.py:

Option	Description
`connection_name`	Required. The name of the connection that stores authentication credentials for the source.
`objects`	Required. A list of tables to ingest. Each entry has the format `{"table": {"source_table": "..."}}`. You can also specify an optional `destination_table` inside the `table` object.
`destination_catalog`	The catalog where ingested tables are written. Defaults to the catalog set during pipeline creation.
`destination_schema`	The schema where ingested tables are written. Defaults to the schema set during pipeline creation.
`scd_type`	The slowly changing dimension strategy: `SCD_TYPE_1`, `SCD_TYPE_2`, or `APPEND_ONLY`. Defaults to `SCD_TYPE_1`.
`primary_keys`	Override the default primary keys for a table. Provide a list of column names.

Requirements​

Create an ingestion pipeline​

Pipeline configuration options​

Requirements

Create an ingestion pipeline

Pipeline configuration options