Enable data access configuration

This article describes the data access configurations performed by Databricks administrators for all SQL warehouses using the UI.

Note

If your workspace is enabled for Unity Catalog, you don’t need to perform the steps in this article. Unity Catalog supports SQL warehouses by default.

Databricks recommends using Unity Catalog volumes or external locations to connect to cloud object storage instead of instance profiles. Unity Catalog simplifies the security and governance of your data by providing a central place to administer and audit data access across multiple workspaces in your account. See What is Unity Catalog? and Recommendations for using external locations.

To configure all SQL warehouses using the REST API, see SQL Warehouses API.

Important

Changing these settings restarts all running SQL warehouses.

For a general overview of how to enable access to data, see Access control overview.

Requirements

  • You must be a Databricks workspace admin to configure settings for all SQL warehouses.

Configure a SQL warehouse to use an instance profile

To configure all warehouses to use an AWS instance profile when accessing AWS storage:

  1. Click your username in the top bar of the workspace and select Admin Settings from the drop-down.

  2. Click the Compute tab.

  3. Click Manage next to SQL warehouses.

  4. In the Instance Profile drop-down, select an instance profile.

  5. Click Save.

Warning

  • If a user does not have permission to use the instance profile, all warehouses the user creates will fail to start.

  • If the instance profile is not valid, all SQL warehouses will become unhealthy.

You can also configure an instance profile using the Databricks Terraform provider and databricks_sql_global_config.

You can also edit the Data Access Configuration textbox entries directly.

Configure data access properties for SQL warehouses

To configure all warehouses with data access properties, such as when you use an external metastore instead of the legacy Hive metastore:

  1. Click your username in the top bar of the workspace and select Admin Settings from the drop-down.

  2. Click the Compute tab.

  3. Click Manage next to SQL warehouses.

  4. In the Data Access Configuration textbox, specify key-value pairs containing metastore properties.

    Important

    To set a Spark configuration property to the value of a secret without exposing the secret value to Spark, set the value to {{secrets/<secret-scope>/<secret-name>}}. Replace <secret-scope> with the secret scope and <secret-name> with the secret name. The value must start with {{secrets/ and end with }}. For more information about this syntax, see Syntax for referencing secrets in a Spark configuration property or environment variable.

  5. Click Save.

You can also configure data access properties using the Databricks Terraform provider and databricks_sql_global_config.

Supported properties

  • For an entry that ends with *, all properties within that prefix are supported.

    For example, spark.sql.hive.metastore.* indicates that both spark.sql.hive.metastore.jars and spark.sql.hive.metastore.version are supported, and any other properties that start with spark.sql.hive.metastore.

  • For properties whose values contain sensitive information, you can store the sensitive information in a secret and set the property’s value to the secret name using the following syntax: secrets/<secret-scope>/<secret-name>.

The following properties are supported for SQL warehouses:

  • spark.databricks.hive.metastore.glueCatalog.enabled

  • spark.databricks.delta.catalog.update.enabled false

  • spark.sql.hive.metastore.* (spark.sql.hive.metastore.jars and spark.sql.hive.metastore.jars.path are unsupported for serverless SQL warehouses.)

  • spark.sql.warehouse.dir

  • spark.hadoop.aws.region

  • spark.hadoop.datanucleus.*

  • spark.hadoop.fs.*

  • spark.hadoop.hive.*

  • spark.hadoop.javax.jdo.option.*

  • spark.hive.*

  • spark.hadoop.aws.glue.*

For more information about how to set these properties, see External Hive metastore and AWS Glue data catalog.

Confirm or set up an AWS instance profile to use with your serverless SQL warehouses

If you already use an instance profile with Databricks SQL, the role associated with the instance profile needs a Databricks Serverless compute trust relationship statement so that serverless SQL warehouses can use it.

You might not need to modify the role depending on how and when your instance profile was created because it might already have the trust relationship. If the instance profile was created in the following ways, it likely has the trust relationship statement:

  • After June 24, 2022, your instance profile was created as part of creating a Databricks workspace by using AWS Quickstart.

  • After June 24, 2022, someone in your organization followed the steps in the Databricks article to create the instance profile manually.

This section describes how to confirm or update that the role associated with the instance profile has the trust relationship statement. That enables your serverless SQL warehouses to use the role to access your S3 buckets.

Important

To perform these steps, you must be a Databricks workspace admin to confirm which instance profile your workspace uses for Databricks SQL. You must also be an AWS account administrator to check the role’s trust relationship policy or make necessary changes. If you are not both of these admin types, contact the appropriate admins in your organization to complete these steps.

  1. In the admin settings page, click the Compute tab, then click Manage next to SQL warehouses.

  2. Look in the Data Security section for the Instance Profile field. Confirm whether your workspace is configured to use an AWS instance profile for Databricks SQL to connect to AWS S3 buckets other than your root bucket.

    • If you use an instance profile, its name is visible in the Instance Profile field. Make a note of it for the next step.

    • If the field value is None, you are not using an instance profile to access S3 buckets other than your workspace’s root bucket. Setup is complete.

  3. Confirm whether your instance profile name matches the associated role name.

    1. In the AWS console, go to the IAM service’s Roles tab. It lists all the IAM roles in your account.

    2. Click the role with the name that matches the instance profile name in the Databricks SQL admin settings in the Data Security section for the Instance Profile field you found earlier in this section.

    3. In the summary area, find the Role ARN and Instance Profile ARNs fields.

    4. Check if the last part of those two fields have matching names after the final slash. For example:

      Does instance profile name and role arn name match
  4. If you determined in the previous step that the role name (the text after the last slash in the role ARN) and the instance profile name (the text after the last slash in the instance profile ARN) do not match, edit your instance profile registration to specify your IAM role ARN.

    1. To edit your instance profiles, look below the Instance profile field and click the Configure button.

    2. Click your instance profile’s name.

    3. Click Edit.

    4. In the optional Role ARN field, paste the role ARN for the role associated with your instance profile. This is the key step that allows your instance profile to work with Databricks SQL Serverless even if the role name does not match the instance profile name.

    5. Click Save.

  5. In the AWS console, confirm or edit the trust relationship.

    1. In the AWS console IAM service’s Roles tab, click the instance profile role you want to modify.

    2. Click the Trust relationships tab.

    3. View the existing trust policy. If the policy already includes the JSON block below, then this step was completed earlier, and you can ignore the following instructions.

    4. Click Edit trust policy.

    5. In the existing Statement array, append the following JSON block to the end of the existing trust policy. Confirm that you don’t overwrite the existing policy.

      {
      "Effect": "Allow",
      "Principal": {
        "AWS": [
          "arn:aws:iam::790110701330:role/serverless-customer-resource-role"
        ]
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
            "sts:ExternalId": [
              "databricks-serverless-<YOUR-WORKSPACE-ID1>",
              "databricks-serverless-<YOUR-WORKSPACE-ID2>"
            ]
          }
        }
      }
      

      The only thing you must change in the statement is the workspace ID. Replace the YOUR_WORKSPACE-ID with one or more Databricks workspace IDs for the workspaces that will use this role. To get your workspace ID while you are using your workspace, check the URL. For example, in https://<databricks-instance>/?o=6280049833385130, the number after o= is the workspace ID.

      Do not edit the principal of the policy. The Principal.AWS field must continue to have the value arn:aws:iam::790110701330:role/serverless-customer-resource-role. This references a serverless compute role managed by Databricks.

    6. Click Review policy.

    7. Click Save changes.

Important

If your instance profile changes later, repeat these steps to verify that the trust relationship for the instance profile’s role contains the required extra statement.

Troubleshooting

If your trust relationship is misconfigured, clusters fail with a message: “Request to create a cluster failed with an exception INVALID_PARAMETER_VALUE: IAM role <role-id> does not have the required trust relationship.”

If you get this error, it could be that the workspace IDs were incorrect or that the trust policy was not updated correctly on the correct role.

Carefully perform the steps in Confirm or set up an AWS instance profile to use with your serverless SQL warehouses to update the trust relationship.

Configuring Glue metastore for serverless SQL warehouses

If you must specify an AWS Glue metastore or add additional data source configurations, update the Data Access Configuration field in the admin settings page. See Enable data access configuration.

Important

Serverless SQL warehouses support the default Databricks metastore and AWS Glue as a metastore, but do not support external Hive metastores.