The Databricks SQL security model is based on the well-established security model in SQL databases that allows setting fine-grained access
permissions using standard SQL statements,
REVOKE. The following figure illustrates how data is secured.
The yellow group of users has been granted permission to query Table 1 and View 1. The files and directories of these tables are organized in different data sets. In the example, Table 1 data is managed in Dataset 1 and Dataset 2 and View 1 data is in Dataset A.
When you are getting started, or for simple scenarios such as when data is accessed in only a single workspace, you can store files and directories directly in the Databricks filesystem (DBFS). These files and directories are said to be managed. You don’t need to configure extra credentials to manage access to managed data.
However, files and directories are usually stored in cloud storage. These files and directories are said to be unmanaged. To access unmanaged data in cloud storage, Databricks authenticates to the cloud provider using cloud storage credentials. In the following figure, Databricks authenticates to the cloud provider using Credential 1 for Dataset 1 and Dataset 2 and Credential 2 for Dataset A. Databricks administrators configure Databricks to use the correct credentials to access data on cloud storage. For more information, see Cloud storage access overview later in this article. Users of tables and views generally do not see or use credentials directly.
Users and groups are typically managed in an identity provider (IdP) and synchronized to Databricks, for example using SCIM.
The following sections provide overviews of how to manage access to data using Databricks table access control, configure access to cloud storage, and manage users and groups.
Databricks table access control enables you to secure the following objects. We refer to these as securable objects.
CATALOG: controls access to the entire data catalog.
DATABASE: controls access to a database.
TABLE: controls access to a managed or external table.
VIEW: controls access to SQL views.
ANY FILE: controls access to the underlying filesystem. Users granted access to
ANY FILEcan bypass the restrictions put on the catalog, databases, tables, and views by reading from the file system directly.
Only Databricks administrators and object owners can grant access to securable objects. A user who creates a database, table, or view in Databricks SQL or in Data Science & Engineering using a cluster enabled for table access control becomes its owner. The owner is granted all privileges and can grant privileges to other users. If an object does not have an owner, an administrator can set object ownership. The following table summarizes the available roles and the objects for which each role can grant privileges.
|Role||Can grant access privileges for|
|Databricks administrator||All objects in the catalog and the underlying filesystem.|
|Catalog owner||All objects in the catalog.|
|Database owner||All objects in the database.|
|Table owner||Only the table.|
For more information, see Data object privileges.
If you are using Databricks managed tables you do not need to configure access to cloud storage.
To query data in cloud storage, Databricks administrators configure access from Databricks SQL to cloud storage using IAM roles and instance profiles, as illustrated in the following figure. In the figure, an IAM role mapped to an instance profile is used to access Cloud Storage 1 and Cloud Storage 2.
Configuring access to cloud storage includes three steps:
- Create an instance profile or reuse an existing instance profile.
- Give the instance profile access to the AWS S3 buckets.
- Configure Databricks SQL endpoints with an instance profile that enables access to data in cloud storage.
Databricks administrators manage users and groups in a Data Science & Engineering workspace. For Databricks SQL, Databricks recommends using groups instead of users, as it makes it easier to administer data access privileges. There are two options to set up groups:
Synchronize Identity Provider (IdP) groups to Databricks using the SCIM API. Databricks recommends this option, as this keeps the source of truth of identity in your IdP while still making it available to Databricks. In the figure, the users and groups from Azure Active Directory are synchronized to Databricks SQL and used in
GRANTstatements to secure database objects such as tables and views.
Create groups in Databricks using SQL, the UI, or API.
Databricks SQL administrators can be configuring a new workspace for access to data objects or may be using a Data Science & Engineering workspace that has already configured access to data objects. The following articles describe the detailed steps for configuring access to cloud storage and mapping Data Science & Engineering security models to Databricks SQL.