Configure manual token refresh authentication for Microsoft SharePoint ingestion
The Microsoft SharePoint connector is in Beta.
This page describes how to configure manual token refresh authentication for Microsoft SharePoint ingestion into Databricks.
Which authentication methods are supported?
The SharePoint connector supports the following OAuth methods:
- U2M authentication (recommended)
- Manual token refresh authentication
Databricks recommends using U2M because it doesn't require computing the refresh token yourself. This is handled for you automatically. It also simplifies the process of granting the Entra ID client access to your SharePoint files and is more secure.
Step 1: Get the SharePoint site ID
- Visit the desired SharePoint site in your browser.
- Append
/_api/site/id
to the URL. - Type Enter.
Step 2: Get SharePoint drive names (optional)
If you want to ingest all of the drives and documents in your SharePoint site, then you can skip this step. However, if you only want to ingest a subset of the drives, then you need to collect their names.
You can find the drive names in the left-hand menu. There is a default drive called Documents in each site. Your organization might have additional drives. For example, the drives in the following screenshot include doclib1
, subsite1doclib1
, and more.
Some drives might be hidden from the list. The drive creator can configure this in the drive settings. In this case, hidden drives might be visible in the Site contents section.
Step 3: Create a client in Microsoft Entra ID
This step creates a client that can access the SharePoint files.
-
In the Microsoft Azure portal (
portal.azure.com
), click Microsoft Entra ID. You might have to search for “Microsoft Entra ID”. -
In the left-hand menu, under the Manage section, click App Registrations.
-
Click New registration.
-
In the Register an application form:
-
Whether you want other tenants to access this application.
-
The redirect URL that you want to use to get the authentication code. Specify one of the following:
- The redirect URL for your own server
https://127.0.0.1
(Even if you don't have a server running onhttps://127.0.0.1
, the app tries to redirect to that page. The code is in the resulting URL, and the URL is in the following format:https://127.0.0.1:5000/oauth2redirect?code=<code>
)
You're redirected to the app details page.
-
-
Make a note of the following values:
- Application (client) ID
- Directory (tenant) ID
-
Click Client credentials : Add a certificate or secret.
-
Click + New client secret.
-
Add a description.
-
Click Add.
The updated list of client secrets displays.
-
Copy the client secret value and store it securely. After you leave the page, you can't access the client secret.
Step 4: Grant the client access to SharePoint files
The client requires the following two permissions to access your SharePoint files:
files.read.all
offline_access
These give the client access to all files in all sites that you have access to. If you're comfortable with this, do the following:
-
Run the following Python code in a new notebook. Modify the
tenant_id
, theclient_id
, and theredirect_url
.Pythonimport requests
# This is the application (client) id (obtained from the App Registration page)
client_id = ""
# This is the tenant id (obtained from the App Registration page)
tenant_id = ""
# A redirect URL is used in OAuth to redirect users to an
# application after they grant permission to access their account.
# In this setup, the authentication code is sent to this server.
redirect_url = ""
# ================= Do not edit code below this line. ==================
# Authorization endpoint URL
authorization_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/authorize"
scopes = ["Files.Read.All"]
scope = " ".join(["https://graph.microsoft.com/{}".format(s) for s in scopes])
scope += ("&offline_access")
# Construct the authorization request URL.
auth_params = {
"client_id": client_id,
"redirect_uri": redirect_url,
"response_type": "code",
"scope": scope
}
auth_url = authorization_url + "?" + "&".join([f"{key}={value}" for key, value in auth_params.items()])
print(auth_url)The output is a URL.
-
In your browser, visit the URL from the code output.
-
Sign in using your Microsoft 365 credentials.
-
In the Permissions requested dialog box, review the requested permissions. Confirm that they include
files.read.all
andoffline_access
and that you’re comfortable granting them. -
Confirm that the email ID at the top of the page matches the email that you use to access SharePoint data.
-
Confirm that the client name matches the credentials from the previous step.
-
Click Accept.
You're redirected to the redirect URL that you specified. You should also receive an authentication code.
-
Make a note of the authentication code.
-
If you don’t get the Permissions required prompt, append
&prompt=consent
to the URL.
Step 5: Get the refresh token
This step fetches the refresh token to complete authorization.
-
Paste the following Python code into a new cell in the notebook from the previous step. Modify the
tenant_id
, theclient_id
, and theredirect_url
.Python# Use the authentication code to get the refresh token.
# This is the client secret (obtained from the Credentials and Secrets page)
client_secret = ""
# ================= Do not edit code below this line. ==================
token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"
# Request parameters
token_params = {
"client_id": client_id,
"client_secret": client_secret,
"code": code,
"grant_type": "authorization_code",
"redirect_uri": redirect_url,
"scope": "profile openid email https://graph.microsoft.com/Files.Read.All offline_access"
}
# Send POST request to token endpoint
response = requests.post(token_url, data=token_params)
# Handle response
if response.status_code == 200:
token_data = response.json()
refresh_token = token_data.get("refresh_token")
print("Refresh Token:", refresh_token)
else:
print("Error:", response.status_code, response.text)
Next steps
- Create a connection to store the authentication details that you've obtained.
- Create an ingestion pipeline.