Identity best practices

This article provides an opinionated perspective on how to best configure identity in Databricks. It includes a guide on how to migrate to identity federation, which enables you to manage all of your users, groups, and service principals in the Databricks account.

For an overview of the Databricks identity model, see Databricks identities and roles.

Configure users, service principals, and groups

There are three types of Databricks identity:

  • Users: User identities recognized by Databricks and represented by email addresses.

  • Service principals: Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms.

  • Groups: Groups simplify identity management, making it easier to assign access to workspaces, data, and other securable objects.

Databricks recommends creating service principals to run production jobs or modify production data. If all processes that act on production data run with service principals, interactive users do not need any write, delete, or modify privileges in production. This eliminates the risk of a user overwriting production data by accident.

It is best practice to assign access to workspaces and access-control policies in Unity Catalog to groups, instead of to users individually. All Databricks identities can be assigned as members of groups, and members inherit permissions that are assigned to their group.

The following are the administrative roles for managing Databricks:

  • Account admins can manage your Databricks account-level configurations including identity, billing, cloud resources, and the creation of workspaces and Unity Catalog metastores.

  • Workspace admins can add users to a Databricks workspace, assign them the workspace admin role, and manage access to objects and functionality in the workspace, such as the ability to create clusters and change job ownership.

  • Metastore admins can manage privileges for all securable objects within a metastore, such as who can create catalogs or query a table. The account admin who creates the Unity Catalog metastore becomes the initial metastore admin.

Databricks recommends that there should be a limited number of account admins per account and workspace admins in each workspace. It is a best practice to transfer the metastore admin role to a group. See (Recommended) Transfer ownership of your metastore to a group.

Sync users and groups from your identity provider to your Databricks account

Databricks recommends using SCIM provisioning to sync users and groups automatically from your identity provider to your Databricks account. SCIM streamlines onboarding a new employee or team by using your identity provider to create users and groups in Databricks and give them the proper level of access. When a user leaves your organization or no longer needs access to Databricks, admins can terminate the user in your identity provider and that user’s account will also be removed from Databricks. This ensures a consistent offboarding process and prevents unauthorized users from accessing sensitive data.

You should aim to synchronize all of the users and groups that intend to use Databricks to the account console rather than individual workspaces. This way you only need to configure one SCIM provisioning application to keep all identities consistent across all workspaces in the account. If you already have workspace-level SCIM provisioning set up for workspaces, you should set up account-level SCIM provisioning and turn off the workspace-level SCIM provisioner. See Upgrade to identity federation.

Account-level SCIM diagram

Specific users, groups and service principals can then be assigned from the account to specific workspaces within Databricks using identity federation.

Configure single sign-on

Single sign-on (SSO) enables you to authenticate your users using your organization’s identity provider. Databricks recommends configuring SSO for greater security and improved usability. You must configure SSO on the account and on individual workspaces. You should configure SSO to the same identity provider for your account and all workspaces in your account. Preferably, use OIDC for SSO configuration at the account level to ensure support of authentication features.

Once SSO is configured on your workspaces, you should configure password access control. Password access control enables you to restrict users from authenticating to REST APIs with their usernames and password. Instead, users must authenticate to REST APIs using personal access tokens. Databricks recommends that you do not grant Can Use passwords to any workspace users. For more information on personal access tokens, see Manage personal access tokens.

Enable identity federation

Identity federation enables you to configure users, service principals, and groups in the account console, and then assign those identities access to specific workspaces. This simplifies Databricks administration and data governance.

With identity federation, you configure Databricks users, service principals, and groups once in the account console, rather than repeating configuration separately in each workspace. This both reduces friction in onboarding a new team to Databricks and enables you to maintain one SCIM provisioning application with your identity provider to the Databricks account, instead of a separate SCIM provisioning application for each workspace. Once users, service principals, and groups are added to the account, you can assign them permissions on workspaces. You can only assign account-level identities access to workspaces that are enabled for identity federation.

Account-level identity diagram

To enable a workspace for identity federation, see How do admins enable identity federation on a workspace?.

Identity federation is enabled on the workspace-level and you can have a combination of identity federated and non-identity federated workspaces. For those workspaces that are not enabled for identity federation, workspace admins manage their workspace users, service principals, and groups entirely within the scope of the workspace (the legacy model). They cannot use the account console or account-level APIs to assign users from the account to these workspaces, but they can use any of the workspace-level interfaces. Whenever a new user or service principal is added to a workspace using workspace-level interfaces, that user or service principal is synchronized to the account-level. This enables you to have one consistent set of users and service principals in your account.

However, when a group is added to a non-identity-federated workspace using workspace-level interfaces, that group is a workspace-local group and is not added to the account. Account groups can be created only by account admins using account-level interfaces. You should aim to use account groups rather than workspace-local groups. Workspace-local groups cannot be granted access-control policies in Unity Catalog or permissions to other workspaces.

Upgrade to identity federation

If you are enabling identity federation on an existing workspace, do the following:

  1. Migrate workspace-level SCIM provisioning to the account level

    If you have a workspace-level SCIM provisioning set up your workspace, you should set up account-level SCIM provisioning and turn off the workspace-level SCIM provisioner. Workspace-level SCIM will continue to create and update workspace-local groups. Databricks recommends using account groups instead of workspace-local groups to take advantage of centralized workspace assignment and data access management using Unity Catalog. For more information about how to disable workspace-level SCIM, see Migrate workspace-level SCIM provisioning to the account level.

  2. Convert workspace-local groups to account groups

    Databricks recommends converting your existing workspace-local groups to account groups. See Migrate workspace-local groups to account groups for instructions. After you migrate the group to the account, you need to grant the new account group access to workspaces, objects, and functionality in the workspace for the group members to maintain their access.

Assign groups workspace permissions

Now that identity federation is enabled on your workspace, you can assign the users, service principals, and groups in your account permissions on that workspace. Databricks recommends that you assign groups permissions to workspaces, instead assigning workspace permissions to users individually. All Databricks identities can be assigned as members of groups, and members inherit permissions that are assigned to their group.

Add workspace permissions

Learn more