Skip to main content

Troubleshoot Microsoft SharePoint ingestion

Preview

The Microsoft SharePoint connector is in Beta.

This page describes common issues with the Microsoft SharePoint connector in Databricks Lakeflow Connect and how to resolve them.

General pipeline troubleshooting

If a pipeline fails while executing, click on the step that failed and confirm whether the error message provides sufficient information about the nature of the error.

View pipeline event logs in the UI

You can also check and download the cluster logs from the pipeline details page by clicking Update details in the right-hand panel, then clicking Logs. Scan the logs for errors or exceptions.

View pipeline update details in the UI

Authentication errors

If you encounter OAuth errors, run the following code to confirm that your refresh token is WAI:

Python
# Fill in these values
refresh_token = ""
tenant_id = ""
client_id = ""
client_secret = ""
site_id = ""


# Get an access token
import requests


# Token endpoint
token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"


scopes = ["Files.Read.All"]
scope = " ".join(["https://graph.microsoft.com/{}".format(s) for s in scopes])
scope += (" offline_access")


# Parameters for the request
token_params = {
"client_id": client_id,
"client_secret": client_secret,
"grant_type": "refresh_token",
"refresh_token": refresh_token,
"scope": scope
}


# Send a POST request to the token endpoint
response = requests.post(token_url, data=token_params)
response.json()


access_token = response.json().get("access_token")

You should get an access token here. You can then check if the access token is able to list all the drives in your SharePoint site.
# List all drives
url = f"https://graph.microsoft.com/v1.0/sites/{site_id}/drives"


# Authorization header with access token
headers = {
"Authorization": f"Bearer {access_token}",
"Accept": "application/json"
}


# Send a GET request to list files with specific extensions
requests.get(url, headers=headers).json()