To enable provisioning to Databricks using Azure Active Directory (Azure AD) you must create an enterprise application for each Databricks workspace.
Your Azure AD account must be a Premium edition account, and you must be a global administrator for that account to enable provisioning.
Your Databricks account must have the Premium plan (or, for customers who subscribed to Databricks before March 3, 2020, the Operational Security package).
In the following examples, replace
<databricks-instance> with the workspace URL of your Databricks deployment.
Generate a personal access token in Databricks and copy it. You provide this token to Azure AD in a subsequent step.
Generate this token as a Databricks admin who will not be managed by the Azure AD enterprise application. A Databricks admin user who is managed by this enterprise application can be deprovisioned using Azure AD, which would cause your SCIM provisioning integration to be disabled.
In your Azure portal, go to Azure Active Directory > Enterprise Applications.
Click + New Application above the application list,. Under Add from the gallery, search for and select Azure Databricks SCIM Provisioning Connector.
Enter a Name for the application and click Add. Use a name that will help administrators find it, like
Under the Manage menu, click Provisioning.
From the Provisioning Mode drop-down, select Automatic.
Enter the Tenant URL:
Replace <databricks-instance> with the workspace URL of your Databricks deployment. See Get workspace, cluster, notebook, model, and job identifiers.
In the Secret Token field, enter the Databricks personal access token that you generated in step 1.
Click Test Connection and wait for the message that confirms that the credentials are authorized to enable provisioning.
Optionally, enter a notification email to receive notifications of critical errors with SCIM provisioning.
Go to Manage > Provisioning and, under Settings, set the Scope to Sync only assigned users and groups.
This option syncs only users and groups assigned to the enterprise application, and is our recommended approach.
Azure Active Directory does not support the automatic provisioning of nested groups to Databricks. It is only able to read and provision users that are immediate members of the explicitly assigned group. As a workaround, you should explicitly assign (or otherwise scope in) the groups that contain the users who need to be provisioned. For more information, see this FAQ.
To start the synchronization of users and groups from Azure AD to Databricks, toggle Provisioning Status on.
Test your provisioning setup:
- Go to Manage > Users and groups.
- Add some users and groups. Click Add user, select the users and groups, and click the Assign button.
- Wait a few minutes and check that the users and groups have been added to your Databricks workspace.
Any additional users and groups that you add and assign will automatically be provisioned when Azure AD schedules the next sync.
Do not assign the Databricks admin whose secret token (bearer token) was used to set up this enterprise application.
- Users and groups that existed in Databricks prior to enabling provisioning exhibit the following behavior upon provisioning sync:
- Are merged if they also exist in this Azure AD enterprise application.
- Are ignored if they don’t exist in this Azure AD enterprise application.
- User permissions that are assigned individually and are duplicated through membership in a group remain after the group membership is removed for the user.
- Users removed from a Databricks workspace directly, using the Databricks Admin console:
- Lose access to that Databricks workspace but may still have access to other Databricks workspaces.
- Will not be synced again using Azure AD provisioning, even if they remain in the enterprise application.
- The initial Azure AD sync is triggered immediately after you turn on provisioning. Subsequent syncs are triggered every 20-40 minutes, depending on the number of users and groups in the application. See Provisioning summary report in the Azure AD documentation.
- The “admins” group is a reserved group in Databricks and cannot be removed.
- Groups cannot be renamed in Databricks; do not attempt to rename them in Azure AD.
- You can use the Databricks Groups API or the Groups UI to get a list of members of any Databricks group.
- You cannot update Databricks usernames and email addresses.
Users and groups do not sync
The issue could be that the Databricks admin user whose personal access token is being used to connect to Azure AD has lost admin status or has an invalid token: log in to the Databricks Admin console as that user and validate that you are still an admin and your access token is still valid.
Another possibility is that you are trying to sync nested groups, which are not supported by Azure AD automatic provisioning. See this FAQ.
After initial sync the users and groups are not syncing
After the initial sync, Azure AD does not sync immediately upon changes to user and group assignments. It schedules a sync with the application after a delay (depending on the number of users and groups). You can go to Manage > Provisioning for the enterprise application and select Clear current state and restart synchronization to initiate an immediate sync.
Azure AD provisioning service IP range not accessible
The Azure AD provisioning service operates under particular IP ranges. If you need to restrict network access, find the IP addresses for
AzureActiveDirectory in this IP range file and add them to your application’s allowlist to allow traffic flow from the Azure AD provisioning service to your application. For more information, see IP Ranges.