Microsoft SharePoint connector reference

Preview

The Microsoft SharePoint connector is in Beta.

This page contains reference material for the Microsoft SharePoint connector in Databricks Lakeflow Connect.

Ingested data format

The ingested data lands in the following format. A site in SharePoint maps to a schema in Databricks. A drive in the SharePoint site maps to a table in the destination schema.

Field	Type	Description
`file_id`	`String`	The unique SharePoint identifier of the file.
`file_metadata`	`Struct`	Contains generic file metadata: `name` (`string`): The name of the file, as it appears in SharePoint. `size_in_bytes` (`bigint`): The size of the file. `created_timestamp` (`timestamp`): The timestamp at which the file was created in SharePoint. `last_modified_timestamp` (`timestamp`): The timestamp at which the file was last modified in SharePoint.
`source_metadata`	`Struct`	Contains SharePoint-specific metadata for the file: `site_id` (`string`): The SharePoint site identifier. `drive_id` (`string`): The SharePoint drive identifier. `file_folder_path` (`string`): The file path of the file in SharePoint (for example, `/drives/d1/root:/folder1`). `quick_xor_hash` (`string`): A custom hash provided by Microsoft that can be used to validate that your downloaded content is accurate. This value can be `NULL` (for example, if the format does not support hashing). See Code Snippets: QuickXorHash Algorithm in the Microsoft documentation. mime_type (string): The `MIME` type (format) of the file. `web_url` (`string`): A link to the file in SharePoint.
`content`	`Struct`	Contains file content. Databricks does not recommend accessing this struct directly. Instead, access it using the UDFs in Downstream RAG use case.
`sequence_id`	`Long`	A sequencing key for ordering different versions of the same file.
`is_deleted`	`Boolean`	Ignore this column. The value will always be `false`. If you need to identify deleted columns, Databricks recommends enabling SCD type 2 and using the `\_\_END_AT column`.

Ingested data format​

Ingested data format