Microsoft SharePoint connector limitations
The Microsoft SharePoint connector is in Beta.
This page lists limitations and considerations for ingestion from Microsoft SharePoint using Databricks Lakeflow Connect.
General SaaS connector limitations
The limitations in this section apply to all SaaS connectors in Lakeflow Connect.
- When you run a scheduled pipeline, alerts don't trigger immediately. Instead, they trigger when the next update runs.
- When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with Lakeflow Declarative Pipelines behavior.
- During source maintenance periods, Databricks might not be able to access your data.
- If a source table name conflicts with an existing destination table name, the pipeline update fails.
- Multi-destination pipeline support is API-only.
- You can optionally rename a table that you ingest. If you rename a table in your pipeline, it becomes an API-only pipeline, and you can no longer edit the pipeline in the UI.
- Column-level selection and deselection are API-only.
- If you select a column after a pipeline has already started, the connector does not automatically backfill data for the new column. To ingest historical data, manually run a full refresh on the table.
- Databricks can't ingest two or more tables with the same name in the same pipeline, even if they come from different source schemas.
- Managed ingestion pipelines aren't supported for workspaces in AWS GovCloud regions (FedRAMP High).
- Managed ingestion pipelines aren't supported for FedRAMP Moderate workspaces in the
us-east-2
orus-west-1
regions.
- The source system assumes that the cursor columns are monotonically increasing.
Connector-specific limitations
-
The SharePoint connector only supports files that are 100 MB or smaller. The metadata for files that are larger than 100 MB will be ingested, but the file content will not be downloaded.
-
Ingesting file-level access control lists (ACLs) and other custom metadata from SharePoint is not supported.
-
Ingesting files that are linked to a different SharePoint document library is not supported.
-
Individual file selection and deselection within a drive are not supported. The connector ingests all of the files in a drive.
-
The utils provided for downstream usage are limited to single-user clusters. However, single-user clusters can't access streaming tables created by other users. Therefore, each downstream user must create their own ingestion pipeline.
You can modify the utils to make them work on serverless and shared clusters, but this can impact performance. See Examples.
-
Some fields (for example,
quickXorHash
,mimeType
) are not supported for all file formats. Even in these cases, file download and other metadata ingestion should work. -
Databricks recommends ingesting at most once hourly.
-
The connector is API-only. The Databricks UI isn't supported.