Skip to main content

Configure manual token refresh authentication for Microsoft SharePoint ingestion

Preview

The Microsoft SharePoint connector is in Beta.

This page describes how to configure manual token refresh authentication for Microsoft SharePoint ingestion into Databricks.

Which authentication methods are supported?

The SharePoint connector supports the following OAuth methods:

  • U2M authentication (recommended)
  • Manual token refresh authentication

Databricks recommends using U2M because it doesn't require computing the refresh token yourself. This is handled for you automatically. It also simplifies the process of granting the Entra ID client access to your SharePoint files and is more secure.

Step 1: Get the SharePoint site ID

  1. Visit the desired SharePoint site in your browser.
  2. Append /_api/site/id to the URL.
  3. Type Enter.

Step 2: Get SharePoint drive names (optional)

If you want to ingest all of the drives and documents in your SharePoint site, then you can skip this step. However, if you only want to ingest a subset of the drives, then you need to collect their names.

You can find the drive names in the left-hand menu. There is a default drive called Documents in each site. Your organization might have additional drives. For example, the drives in the following screenshot include doclib1, subsite1doclib1, and more.

View SharePoint drives

Some drives might be hidden from the list. The drive creator can configure this in the drive settings. In this case, hidden drives might be visible in the Site contents section.

View hidden SharePoint drives

Step 3: Create a client in Microsoft Entra ID

This step creates a client that can access the SharePoint files.

  1. In the Microsoft Azure portal (portal.azure.com), click Microsoft Entra ID. You might have to search for “Microsoft Entra ID”.

    Azure portal: Entra ID card

  2. In the left-hand menu, under the Manage section, click App Registrations.

  3. Click New registration.

    New registration button for Entra ID app

  4. In the Register an application form:

    • Whether you want other tenants to access this application.

    • The redirect URL that you want to use to get the authentication code. Specify one of the following:

      • The redirect URL for your own server
      • https://127.0.0.1 (Even if you don't have a server running on https://127.0.0.1, the app tries to redirect to that page. The code is in the resulting URL, and the URL is in the following format: https://127.0.0.1:5000/oauth2redirect?code=<code>)

    Register an application form

    You're redirected to the app details page.

    OAuth application details page

  5. Make a note of the following values:

    • Application (client) ID
    • Directory (tenant) ID
  6. Click Client credentials : Add a certificate or secret.

  7. Click + New client secret.

    + New client secret button

  8. Add a description.

  9. Click Add.

    The updated list of client secrets displays.

  10. Copy the client secret value and store it securely. After you leave the page, you can't access the client secret.

Step 4: Grant the client access to SharePoint files

The client requires the following two permissions to access your SharePoint files:

  • files.read.all
  • offline_access

These give the client access to all files in all sites that you have access to. If you're comfortable with this, do the following:

  1. Run the following Python code in a new notebook. Modify the tenant_id, the client_id, and the redirect_url.

    Python
    import requests

    # This is the application (client) id (obtained from the App Registration page)

    client_id = ""

    # This is the tenant id (obtained from the App Registration page)

    tenant_id = ""

    # A redirect URL is used in OAuth to redirect users to an

    # application after they grant permission to access their account.

    # In this setup, the authentication code is sent to this server.

    redirect_url = ""

    # ================= Do not edit code below this line. ==================

    # Authorization endpoint URL

    authorization_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/authorize"

    scopes = ["Files.Read.All"]
    scope = " ".join(["https://graph.microsoft.com/{}".format(s) for s in scopes])
    scope += ("&offline_access")

    # Construct the authorization request URL.

    auth_params = {
    "client_id": client_id,
    "redirect_uri": redirect_url,
    "response_type": "code",
    "scope": scope
    }

    auth_url = authorization_url + "?" + "&".join([f"{key}={value}" for key, value in auth_params.items()])

    print(auth_url)

    The output is a URL.

  2. In your browser, visit the URL from the code output.

  3. Sign in using your Microsoft 365 credentials.

  4. In the Permissions requested dialog box, review the requested permissions. Confirm that they include files.read.all and offline_access and that you’re comfortable granting them.

  5. Confirm that the email ID at the top of the page matches the email that you use to access SharePoint data.

  6. Confirm that the client name matches the credentials from the previous step.

    Microsoft permissions requested dialog box

  7. Click Accept.

    You're redirected to the redirect URL that you specified. You should also receive an authentication code.

  8. Make a note of the authentication code.

  9. If you don’t get the Permissions required prompt, append &prompt=consent to the URL.

Step 5: Get the refresh token

This step fetches the refresh token to complete authorization.

  1. Paste the following Python code into a new cell in the notebook from the previous step. Modify the tenant_id, the client_id, and the redirect_url.

    Python
    # Use the authentication code to get the refresh token.


    # This is the client secret (obtained from the Credentials and Secrets page)
    client_secret = ""


    # ================= Do not edit code below this line. ==================


    token_url = f"https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token"




    # Request parameters
    token_params = {
    "client_id": client_id,
    "client_secret": client_secret,
    "code": code,
    "grant_type": "authorization_code",
    "redirect_uri": redirect_url,
    "scope": "profile openid email https://graph.microsoft.com/Files.Read.All offline_access"
    }


    # Send POST request to token endpoint
    response = requests.post(token_url, data=token_params)


    # Handle response
    if response.status_code == 200:
    token_data = response.json()
    refresh_token = token_data.get("refresh_token")
    print("Refresh Token:", refresh_token)
    else:
    print("Error:", response.status_code, response.text)

Next steps

  1. Create a connection to store the authentication details that you've obtained.
  2. Create an ingestion pipeline.