Skip to main content

Use a registered community connector

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

This page shows how to use a registered community connector to ingest data from a supported source into Databricks. To create a custom connector for a source that isn't supported yet, see Create a custom connector.

Requirements

  • A Databricks workspace with Unity Catalog enabled
  • A connection for the source you want to ingest, or permissions to create a connection
  • Write access to a catalog and schema for the ingested tables

Create an ingestion pipeline

To use a registered community connector:

  1. In the sidebar of your Databricks workspace, click +New > Add or upload data, then select the source under Community connectors.

  2. Click + Create connection or select an existing connection, then click Next.

  3. For Pipeline name, enter a name for the pipeline.

  4. For Event log location, enter a catalog name and a schema name. Databricks stores the pipeline event log here. Ingested tables are also written here by default.

  5. For Root path, enter your workspace path (for example, /Workspace/Users/<your-email>/connectors). Databricks clones and stores the connector source code here.

  6. Click Create pipeline.

  7. In the pipeline editor, open ingest.py and update the objects field to include the tables you want to ingest. For example:

    Python
    from databricks.labs.community_connector.pipeline import ingest

    pipeline_spec = {
    "connection_name": "my_stripe_connection", # Required: UC connection name
    "objects": [
    {"table": {"source_table": "charges"}},
    {"table": {"source_table": "customers",
    "destination_table": "stripe_customers"}},
    ],
    }

    ingest(spark, pipeline_spec)
  8. Run the pipeline manually or schedule it.

Pipeline configuration options

You can configure the following options in ingest.py:

Option

Description

connection_name

Required. The name of the connection that stores authentication credentials for the source.

objects

Required. A list of tables to ingest. Each entry has the format {"table": {"source_table": "..."}}. You can also specify an optional destination_table inside the table object.

destination_catalog

The catalog where ingested tables are written. Defaults to the catalog set during pipeline creation.

destination_schema

The schema where ingested tables are written. Defaults to the schema set during pipeline creation.

scd_type

The slowly changing dimension strategy: SCD_TYPE_1, SCD_TYPE_2, or APPEND_ONLY. Defaults to SCD_TYPE_1.

primary_keys

Override the default primary keys for a table. Provide a list of column names.