Configure access to cloud storage

This article describes how Databricks SQL administrators configure a new workspace for access to data objects.

Note

  • If you are using Databricks managed tables you do not need to configure access to cloud storage.
  • Databricks SQL endpoints all share the same cloud storage access credentials.

To configure data access for Databricks SQL, follow the steps in this section:

Step 1: Create or reuse an instance profile for each S3 bucket

Databricks recommends setting up a new instance profile with access to all S3 buckets that should be accessed from Databricks SQL.

A Databricks administrator performs one of the following steps in the AWS Console:

  • (Optional) Create an instance profile to access an S3 bucket. If you want to reuse an existing instance profile, you can skip this step.
  • If you are reusing an instance profile, copy the ARN from IAM Service > Roles in the roles summary of the role you want to reuse.

Step 2: Grant instance profile access to S3 buckets

A Databricks administrator performs the following steps in the AWS console.

  1. Create a bucket policy for all the target S3 buckets. Repeat this step for all the buckets you want to access from DB SQL.
  2. Note the IAM role used to create the Databricks deployment.
  3. Add the S3 IAM role to the EC2 policy

Step 3: Configure Databricks SQL to use the instance profile for data access

A Databricks administrator performs this step in the Data Science & Engineering workspace admin console:

  1. Add the instance profile to Databricks.

    A Databricks administrator specifies data access configuration in the Databricks SQL admin console.

  2. Click User Settings Icon Settings at the bottom of the sidebar and select SQL Admin Console.

  3. Click the SQL Endpoint Settings tab.

  4. In the Instance Profile drop-down, select an instance profile. If there are no profiles, click Configure to open the Databricks admin console in a new tab to configure an instance profile.

  5. Click Save.

Step 4: Define data access privileges using table access control

A Databricks administrator or data object owner performs this step in the Databricks SQL query editor. They grant privileges to users or groups by issuing GRANT (Databricks SQL) statements.

For each group of users, assign permissions to objects. It is common to do this at the database level. This could be as simple as an administrator or owner issuing the following command in Databricks SQL:

GRANT USAGE, SELECT, READ_METADATA ON DATABASE sales TO `analysts`

This command gives read access to the analysts group on the sales database. Privileges are inherited, so granting read permission on the database allows read access to all the tables and views stored in the database, including any future objects added to the database. For a detailed explanation of the privileges that can be granted to users and groups, see Privileges.

Step 5: (Optional) Set owner

A Databricks administrator performs this step in a notebook in a Data Science & Engineering workspace.

Administrators set owners using ALTER TABLE (Databricks SQL) statements. The simplest option is to set the owner to a group of admins. Alternatively, to enable a delegated security model, you can select different owners for each database, giving each the ability to manage permissions on the objects in the database.