Confluence connector FAQs

This page answers frequently asked questions about the Confluence connector in Databricks Lakeflow Connect.

General managed connector FAQs

See Managed connector FAQs for FAQs that apply to all Lakeflow Connect managed connectors. The following are specific to Confluence.

Connector-specific FAQs

The answers in this section are specific to the Confluence connector.

How does the connector pull data from Confluence?

The Confluence connector uses the Confluence REST API to retrieve page content, metadata, and attachments from your Confluence spaces.

Can I ingest specific pages or entire spaces?

No.

How does the connector handle page hierarchy?

The connector maintains the hierarchical structure of pages within a space. Parent-child relationships between pages are preserved in the ingested data.

Does the connector support incremental ingestion?

The connector currently supports the following tables.

It ingests pages, blogposts, and attachments incrementally.
However, it ingests spaces, labels, and classification_levels using snapshots. This means that it overwrites the data on each pipeline run.

For information about the resulting schemas, see Schemas.

How are Confluence attachments handled?

Attachment metadata (filename, size, content type, upload date) is ingested. The actual attachment files aren't ingested by default. If you need to ingest attachment content, contact Databricks support.

What happens if a page is deleted in Confluence?

When you use SCD type 2, deleted pages are tracked and marked with a deletion timestamp in the destination table. With SCD type 1, the page is removed from the destination table.

Can I ingest archived spaces?

The connector only ingests active spaces. Archived spaces aren't included in the ingestion pipeline.

What permissions does the connector require?

The Confluence user account must have read access to the spaces and pages you want to ingest. Databricks recommends using a dedicated service account with appropriate permissions. For more information, see Configure OAuth U2M for Confluence ingestion.

How does the connector handle page formatting?

Page content is ingested in Confluence storage format, which is an XHTML-based format. You can parse this content in downstream processing to extract plain text or convert to other formats. See Confluence Storage Format in the Confluence documentation.

Are page comments ingested?

The connector doesn't ingest page comments.

Can I filter pages by label or tag?

No, the connector ingests all pages.

General managed connector FAQs​

Connector-specific FAQs​

How does the connector pull data from Confluence?​

Can I ingest specific pages or entire spaces?​

How does the connector handle page hierarchy?​

Does the connector support incremental ingestion?​

How are Confluence attachments handled?​

What happens if a page is deleted in Confluence?​

Can I ingest archived spaces?​

What permissions does the connector require?​

How does the connector handle page formatting?​

Are page comments ingested?​

Can I filter pages by label or tag?​