Skip to main content

Troubleshoot Microsoft SharePoint ingestion

This page describes common issues with the Microsoft SharePoint connector in Databricks Lakeflow Connect and how to resolve them.

General pipeline troubleshooting

If a pipeline fails while executing, click the step that failed and confirm whether the error message has sufficient information about the nature of the error.

View pipeline event logs in the UI

Check and download the cluster logs from the pipeline details page by clicking Update details in the right-hand pane, then clicking Logs. Scan the logs for errors or exceptions.

View pipeline update details in the UI

Restrict access to SharePoint files

To restrict the SharePoint files that the connector can access, create a dedicated Microsoft Entra ID user with restricted SharePoint permissions and authenticate to SharePoint with that account. Because the connector uses delegated access (U2M OAuth), it acts on behalf of a Microsoft Entra ID user and can only access files that the user has permission to view.

Authentication errors

If you encounter OAuth errors, run the following code to confirm that your refresh token is working as expected:

Python
# Fill in these values
refresh_token = ""
tenant_id = ""
client_id = ""
client_secret = ""
site_id = ""


# Get an access token
import requests


# Token endpoint
token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"


scopes = ["Sites.Read.All"]
scope = " ".join(["https://graph.microsoft.com/{}".format(s) for s in scopes])
scope += (" offline_access")


# Parameters for the request
token_params = {
"client_id": client_id,
"client_secret": client_secret,
"grant_type": "refresh_token",
"refresh_token": refresh_token,
"scope": scope
}


# Send a POST request to the token endpoint
response = requests.post(token_url, data=token_params)
response.json()


access_token = response.json().get("access_token")

# You should get an access token here. You can then check if the access token is able to list all the drives in your SharePoint site.
# List all drives
url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives"


# Authorization header with access token
headers = {
"Authorization": f"Bearer {access_token}",
"Accept": "application/json"
}


# Send a GET request to list files with specific extensions
requests.get(url, headers=headers).json()