Confluence connector limitations

Beta

This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.

This page lists limitations and considerations for ingesting data from Confluence using Databricks Lakeflow Connect.

General SaaS connector limitations

The limitations in this section apply to all SaaS connectors in Lakeflow Connect.

When you run a scheduled pipeline, alerts don't trigger immediately. Instead, they trigger when the next update runs.
When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with Lakeflow Spark Declarative Pipelines behavior.
During source maintenance periods, Databricks might not be able to access your data.
If a source table name conflicts with an existing destination table name, the pipeline update fails.
Multi-destination pipeline support is API-only.
You can optionally rename a table that you ingest. If you rename a table in your pipeline, it becomes an API-only pipeline, and you can no longer edit the pipeline in the UI.
Column-level selection and deselection are API-only.
If you select a column after a pipeline has already started, the connector does not automatically backfill data for the new column. To ingest historical data, manually run a full refresh on the table.
Databricks can't ingest two or more tables with the same name in the same pipeline, even if they come from different source schemas.
The source system assumes that the cursor columns are monotonically increasing.
With SCD type 1 enabled, deletes don't produce an explicit delete event in the change data feed. For auditable deletions, use SCD type 2 if the connector supports it. For details, see Example: SCD type 1 and SCD type 2 processing with CDF source data.
The connector ingests raw data without transformations. Use downstream Lakeflow Spark Declarative Pipelines pipelines for transformations.

Connector-specific limitations

The limitations in this section are specific to the Confluence connector.

Supported data

The connector only ingests the following tables from Confluence:

pages
spaces
labels
classification_levels
blogposts
attachments

Deployment mode

The connector only supports Confluence Cloud.

ACL ingestion

The connector does not currently support ingesting Confluence ACLs. Similarly, the connector does not trigger reingestion when the ACLs of a data source change.

Pipelines

UI-based pipeline authoring isn't supported. You must use the Databricks CLI, APIs, SDKs, or Databricks Asset Bundles to create pipelines.

Content ingestion

Attachment files aren't ingested. Only attachment metadata (filename, size, content type, upload date) is included in the ingested data.
Page comments aren't ingested. Only page content and metadata are included.
Archived spaces aren't ingested. Only active spaces are included in the ingestion pipeline.

API rate limits

The connector is subject to Confluence API rate limits. If you exceed the rate limits, the pipeline might slow down or fail temporarily. The connector automatically retries with exponential backoff.
Databricks recommends scheduling pipeline runs during off-peak hours to minimize the impact of rate limits.

Authentication

The connector requires OAuth U2M authentication. Basic authentication isn't supported.

Performance considerations

Initial pipeline runs (full snapshots) might take longer for large Confluence instances with many pages.
Incremental ingestion performance depends on the number of pages modified since the last run.
Large pages with extensive content or many attachments might take longer to ingest.

The following limitations are for changes that are not reflected in the cursor:

For incrementally ingested tables, the connector supports soft deletes (for example, records that are moved to the trash in Confluence). However, it doesn’t support hard deletes (for example, records that are “purged” in Confluence). To reflect hard deletes, you must run a full refresh of the pipeline.

When a space is deleted, all of its pages and attachments are hard deleted. Therefore, these deletions are not reflected in the destination tables. However, when a parent page is soft deleted, all of its child pages and attachments are deleted in the destination tables.
Archived content for incremental tables is not supported.
When a page or a blogpost is moved from one space to another or from one parent to another, the corresponding spaceId is not updated.
Restored records: If you restore a page or a blogpost after deleting it in the source, the connector does not reingest it.

General SaaS connector limitations​

Connector-specific limitations​

Supported data​

Deployment mode​

ACL ingestion​

Pipelines​

Content ingestion​

API rate limits​

Authentication​

Performance considerations​