Single sign-on (SSO) enables you to authenticate your users using your organization’s identity provider. If your identity provider supports the SAML 2.0 protocol, you can use Databricks SSO to integrate with your identity provider.
You can choose how users in your organization get access to Databricks one of two ways:
- Add users through your identity provider and enable auto user creation. If the user’s account does not already exist, a new account will be provisioned for them upon login.
- Manually add users in Databricks as described in Manage users and disable auto user creation. If the user’s account does not already exist in Databricks, they cannot log in.
Another way to provision users to Databricks is to use the REST APIs for System for Cross-domain Identity Management (SCIM), which is typically used in conjunction with SSO. SCIM is a standard for automating the exchange of user identity information between identity domains for automatic provisioning and deprovisioning of users. See Provision users and groups using SCIM
Go to the Admin Console and select the SSO tab.
Go to your identity provider and create a Databricks application with the information in the Databricks SAML URL field.
You can read the instructions on how to set this up for:
- AWS single sign-on (SSO)
- Microsoft Windows Active Directory
- Google Workspace (formerly GSuite) single sign-on (SSO v1.0)
- Google Workspace (formerly GSuite) single sign-on (SSO v2.0)
- Okta single sign-on (SSO)
- OneLogin single sign-on (SSO)
- Ping Identity single sign-on (SSO)
The process is similar for any identity provider that supports SAML 2.0.
In the Provide the information from the identity provider field, paste in information from your identity provider in the Databricks SSO.
If you want to enable automatic user creation, select Allow auto user creation.
If you are configuring Access S3 buckets using IAM credential passthrough with SAML 2.0 federation, select Allow IAM role entitlement auto sync.
Click Enable SSO.
(Optional) Configure password permissions using password access control.
This feature is in Public Preview.
Access control is available only in the Premium plan (or, for customers who subscribed to Databricks before March 3, 2020, the Operational Security package). If you do not have that plan, your users experience the default sign-in behavior with SSO.
By default, all admin users can sign in to Databricks using either SSO or their username and password, and all API users can authenticate to the Databricks REST APIs using their username and password. As an admin, you can limit admin users’ and API users’ ability to authenticate with their username and password by configuring permissions using password access control.
There are two permission levels for passwords: No permissions and Can use. Can use grants more abilities to administrators than to non-admin users. This table lists the abilities for each permission.
|Ability||No permissions||Non-admin with Can use||Admin with Can use|
|Can authenticate to the API using password||x||x|
|Can authenticate to the Databricks UI using password||x|
If a non-admin user with no permissions attempts to make a REST API call using a password, authentication will fail. Admins should ensure that personal access tokens are enabled for the workspace and instruct their users to use them for authentication to the APIs.
Admin users with Can use permission see the Admin Log In tab on the sign-in page. They can choose to use that tab to log in to Databricks with username and password.
Admins with no permissions do not see this page and must log in using SSO.
This section describes how to manage permissions using the UI. You can also use the Permissions API.
Go to the Admin Console.
Select the Access Control tab.
Next to Password Access Control, click the Configure button.
In the Permission Settings dialog, assign password permission to users and groups, including the
adminsgroup, using the drop-down menu next to the user or group name.
Once SSO is enabled, you will see the single sign-on option on the sign-in page.
The default behavior is as follows:
- Non-admin users must sign in to Databricks using SSO; they cannot sign in using their username and password.
- Admin users are granted the Can use password access control permission and can sign in with either SSO or their username and password.
- API users are granted the Can use password access control permission and can use their username and password to make REST API calls (although you should encourage them to user personal access tokens instead).
If your Databricks account is on the Premium plan (or, for customers who subscribed to Databricks before March 3, 2020, the Operational Security package), you can limit admin users’ and API users’ ability to authenticate with their username and password by configuring permissions using password access control.
If a user’s current email address (username) with Databricks is the same as in the identity provider, then the migration will be automatic (as long as auto-user creation is enabled) and you can skip this step.
If a user’s email address with the identity provider is different than the one with Databricks, then a new user based on the identity provider email will appear in Databricks when they login. Since non-admin users will no longer be able to login with their old email address and password, they will not be able to access the files in their existing Users folder.
We recommended the following steps to migrate files from their old Users folder to their new Users folder:
An admin can remove the old user. This marks the user’s folder directory as defunct and the directory will follow all of the active users in the workspace folder list. All notebooks and libraries will still be accessible by admins. All clusters and jobs created by the user will remain as is. If the user had any other ACLs set, enabling SSO will cause those to reset, and the admin must manually set those ACLs for the new user.
The admin can then move the old user’s folder into the new one as shown in the following figure.