Skip to main content

Microsoft SharePoint connector limitations

This page lists limitations and considerations for ingestion from Microsoft SharePoint using Databricks Lakeflow Connect.

Connector-specific limitations

  • For unstructured (BINARYFILE) ingestion, the connector supports only files that are 100 MB or smaller. The metadata for files larger than 100 MB is ingested, but the file content is not downloaded. There is no file size limit for structured file formats.

  • Unstructured (BINARYFILE) ingestion supports only SCD_TYPE_1 storage mode. Structured ingestion (CSV, JSON, XML, EXCEL, and other formats) supports only APPEND_ONLY storage mode. SCD type 2 is not supported. When configuring storage mode, set storage_mode in table_configuration. Setting the scd_type field throws an error.

  • The connector does not support individual file selection. It ingests all files in a configured folder, drive, subsite, or site.

  • The connector does not support ingesting file-level access control lists (ACLs) from SharePoint.

  • The connector does not support ingesting files linked to a different SharePoint document library.

  • Databricks recommends ingesting at most once hourly.

  • The utils provided for downstream usage are limited to single-user clusters. However, single-user clusters cannot access streaming tables created by other users. Each downstream user must create their own ingestion pipeline. You can modify the utils to make them work on serverless and shared clusters, but this can affect performance. See File access examples.

  • Some fields (for example, quick_xor_hash, mime_type) are not supported for all file formats. File download and other metadata ingestion continue to work in these cases.