Troubleshoot Microsoft SharePoint ingestion
The Microsoft SharePoint connector is in Beta.
This page describes common issues with the Microsoft SharePoint connector in Databricks Lakeflow Connect and how to resolve them.
General pipeline troubleshooting
If a pipeline fails while executing, click on the step that failed and confirm whether the error message provides sufficient information about the nature of the error.
You can also check and download the cluster logs from the pipeline details page by clicking Update details in the right-hand panel, then clicking Logs. Scan the logs for errors or exceptions.
Limit access to SharePoint files
To restrict the SharePoint files that the connector can access, create a dedicated Entra ID user with restricted SharePoint permissions and authenticate to SharePoint with that account.
Background: The connector doesn’t support app-only access (M2M OAuth). In this model, the app authenticates as itself with a service principal, and an admin grants it access to an entire SharePoint site, including all files in that site. Instead, the connector uses delegated access (U2M OAuth). In this model, the connector acts on behalf of a Microsoft Entra ID user and can only access files that the user has permission to view.
Authentication errors
If you encounter OAuth errors, run the following code to confirm that your refresh token is WAI:
# Fill in these values
refresh_token = ""
tenant_id = ""
client_id = ""
client_secret = ""
site_id = ""
# Get an access token
import requests
# Token endpoint
token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
scopes = ["Files.Read.All"]
scope = " ".join(["https://graph.microsoft.com/{}".format(s) for s in scopes])
scope += (" offline_access")
# Parameters for the request
token_params = {
"client_id": client_id,
"client_secret": client_secret,
"grant_type": "refresh_token",
"refresh_token": refresh_token,
"scope": scope
}
# Send a POST request to the token endpoint
response = requests.post(token_url, data=token_params)
response.json()
access_token = response.json().get("access_token")
You should get an access token here. You can then check if the access token is able to list all the drives in your SharePoint site.
# List all drives
url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives"
# Authorization header with access token
headers = {
"Authorization": f"Bearer {access_token}",
"Accept": "application/json"
}
# Send a GET request to list files with specific extensions
requests.get(url, headers=headers).json()