Set up single sign-on

Single sign-on (SSO) enables you to authenticate your users using your organization’s identity provider. If your identity provider supports the SAML 2.0 protocol, you can use Databricks SSO to integrate with your identity provider.

You can choose how users in your organization get access to Databricks one of two ways:

  • Add users through your identity provider and enable auto user creation. If the user’s account does not already exist, a new account will be provisioned for them upon login.
  • Manually add users in Databricks as described in Manage users and disable auto user creation. If the user’s account does not already exist in Databricks, they cannot log in.

Note

Another way to provision users to Databricks is to use the REST APIs for System for Cross-domain Identity Management (SCIM), which is typically used in conjunction with SSO. SCIM is a standard for automating the exchange of user identity information between identity domains for automatic provisioning and deprovisioning of users.

Requirements for different SSO versions

On Databricks, the SSO software is either version 2.0 or 1.0.

  • Workspaces on the E2 version of the Databricks platform always use SSO version 2.0. All new Databricks accounts and most existing accounts are now E2. Therefore, most workspaces use SSO version 2.0.

    SSO version 2.0 requires the SAML response to be cryptographically signed by the identity provider. For some identity providers, this requires additional configuration.

  • Workspaces on the ST version of the Databricks platform default to version 1 of SamlAuthenticator, but your Databricks representative can upgrade a workspace to use version 2. After the workspace is upgraded to use version 2, the SSO SAML endpoint URL for the workspace changes, and you need to change it in your IdP. If an ST workspace is upgraded to E2, the SSO SAML endpoint URL for the workspace changes, and you need to change it in your IdP.

    With SSO version 1.0, a signed SAML response is optional but is highly recommended.

If you are not sure which account type you have, contact your Databricks representative.

Enable single sign-on authentication

  1. Go to the Admin Console and select the SSO tab.

  2. Go to your identity provider and create a Databricks application with the information in the Databricks SAML URL field.

    SAML URL

    You can read the instructions on how to set this up for:

    The process is similar for any identity provider that supports SAML 2.0.

  3. In the Provide the information from the identity provider field, paste in information from your identity provider in the Databricks SSO.

  4. If you want to enable automatic user creation, select Allow auto user creation.

  5. If you are configuring Access S3 buckets using IAM credential passthrough with SAML 2.0 federation, select Allow IAM role entitlement auto sync.

    SSO tab
  6. Click Enable SSO.

  7. (Optional) Configure password permissions using password access control.

(Optional) Configure password access control

Note

Access control is available only in the Premium plan (or, for customers who subscribed to Databricks before March 3, 2020, the Operational Security package). If you do not have that plan, your users experience the default sign-in behavior with SSO.

By default, all admin users can sign in to Databricks using either SSO or their username and password, and all API users can authenticate to the Databricks REST APIs using their username and password. As an admin, you can limit admin users’ and API users’ ability to authenticate with their username and password by configuring permissions using password access control.

There are two permission levels for passwords: No permissions and Can Use. Can Use grants more abilities to administrators than to non-admin users. This table lists the abilities for each permission.

Ability No permissions Non-admin with Can Use Admin with Can Use
Can authenticate to the API using password   x x
Can authenticate to the Databricks UI using password     x

If a non-admin user with no permissions attempts to make a REST API call using a password, authentication will fail. Admins should ensure that personal access tokens are enabled for the workspace and instruct their users to use them for authentication to the APIs.

Admin users with Can Use permission see the Admin Log In tab on the sign-in page. They can choose to use that tab to log in to Databricks with username and password.

SSO admin login tab

Admins with no permissions do not see this page and must log in using SSO.

Configure password permission

Note

This section describes how to manage permissions using the UI. You can also use the Permissions API 2.0.

  1. Go to the Admin Console.
  2. Click the Workspace Settings tab.
  3. Next to Password Usage, click Permission Settings.
  4. In the Permissions Settings dialog, assign password permission to users and groups using the drop-down menu next to the user or group. You can also configure permissions for the Admins group.
  5. Click Save.

Sign-in process

Once SSO is enabled, you will see the single sign-on option on the sign-in page.

SSO login

The default behavior is as follows:

  • Non-admin users must sign in to Databricks using SSO; they cannot sign in using their username and password.
  • Admin users are granted the Can use password access control permission and can sign in with either SSO or their username and password.
  • API users are granted the Can use password access control permission and can use their username and password to make REST API calls (although you should encourage them to user personal access tokens instead).

If your Databricks account is on the Premium plan (or, for customers who subscribed to Databricks before March 3, 2020, the Operational Security package), you can limit admin users’ and API users’ ability to authenticate with their username and password by configuring permissions using password access control.

Migrate existing users to SSO

Note

If a user’s current email address (username) with Databricks is the same as in the identity provider, then the migration will be automatic (as long as auto-user creation is enabled) and you can skip this step.

If a user’s email address with the identity provider is different than the one with Databricks, then a new user based on the identity provider email will appear in Databricks when they login. Since non-admin users will no longer be able to login with their old email address and password, they will not be able to access the files in their existing Users folder.

We recommended the following steps to migrate files from their old Users folder to their new Users folder:

  1. An admin can remove the old user. This marks the user’s folder directory as defunct and the directory will follow all of the active users in the workspace folder list. All notebooks and libraries will still be accessible by admins. All clusters and jobs created by the user will remain as is. If the user had any other ACLs set, enabling SSO will cause those to reset, and the admin must manually set those ACLs for the new user.

  2. The admin can then move the old user’s folder into the new one as shown in the following figure.

    SSO move

Troubleshooting

Databricks recommends installing the SAML Tracer extension for Chrome or Firefox. SAML Tracer allows you to examine the SAML request sent from Databricks to the IdP and the SAML response sent from the IdP to Databricks.

If you can’t solve your problem using the following troubleshooting tips, contact your Databricks representative.

Verify the SSO version and URL

  1. In Databricks, go to the SSO tab of the admin console.
  2. Verify the SSO version, which is in parentheses next to Single Sign-On.
  3. Verify the Databricks SAML URL. You provide this URL to the IdP. This URL changes if you upgrade your workspace to the E2 version of the Databricks platform or if your workspace is upgraded to SSO version 2.

Verify that the SAML response is signed

Follow these steps with SAML Tracer installed in your browser.

  1. In an incognito window, open SAML Tracer by going to Tools > SAML Tracer.
  2. Go to the Databricks workspace and attempt to log in using SSO.
  3. In SAML Tracer, go to the Response tab.
    • If the response is signed, the <saml2p:Response> element has a child <ds:signature> element. If the response is not signed, configure your IdP to sign the SAML response. Follow the link for your IdP in Requirements for different SSO versions.
    • If the assertion is signed, the <saml:Assertion> element, which is a child of the <saml2p:Response> element, has a child ds:Signature> element.

Verify the case sensitivity of the user email address

In Databricks, the user email address is case sensitive. Consider a user whose email address is john.doe@example.com in Databricks and John.Doe@example.com in the IdP. Logging into Databricks will fail in one of the following ways:

  • If automatic user creation is enabled in Databricks, a new user is created and the user is logged in using those credentials instead of the existing one.
  • If automatic user creation is disabled, login fails with an error like We encountered an error logging you in. Databricks support has been alerted and will begin looking into this issue right away.

To fix this problem, you can delete the existing user in Databricks. If automatic user creation is enabled, ask the user to log in again using SSO. Otherwise, recreate the user, making sure the email address exactly matches the record in the IdP. After the user exists in Databricks, you must unarchive and reassign any assets owned by the previous user to the new one.