Manage cluster policies

Preview

This feature is in Public Preview.

Bias-free communication

Databricks supports a diverse and inclusionary environment. This article contains references to the word blacklist. Databricks recognizes this as an exclusionary word. It is used in this article for consistency because it is currently the word that appears in the software. When the software is updated to remove the word, this article will be updated to be in alignment.

A cluster policy limits the ability to configure clusters based on a set of rules. The policy rules limit the attributes or attribute values available for cluster creation. Cluster policies have ACLs that limit their use to a specific users and groups.

Cluster policies let you:

  • Limit users to create clusters with prescribed settings.
  • Simplify the user interface and enable more users to create their own clusters (by fixing and hiding some values).
  • Control cost by limiting per cluster maximum cost (by setting limits on attributes whose values contribute to hourly price).

Cluster policy permissions limit which policies a user can select in the Policy drop-down when the user creates a cluster:

  • A user who has cluster create permission can select the Free form policy and create fully-configurable clusters.
  • A user who has both cluster create permission and access to cluster policies can select the Free form policy and policies they have access to.
  • A user that has access to only cluster policies, can select the policies they have access to.

Note

If no policies have been created in the workspace, the Policy drop-down does not display.

Only admin users can create, edit, and delete policies. Admin users also have access to all policies.

This article focuses on managing policies using the UI. You can also use the Cluster Policies APIs to manage policies.

Requirements

Cluster policies require the Premium plan (or, for customers who subscribed to Databricks before March 3, 2020, the Operational Security package).

Enforcement rules

You can express the following types of constraints in policy rules:

  • Fixed value with disabled control element
  • Fixed value with control hidden in the UI (value is visible in the JSON view)
  • Attribute value limited to a set of values (either whitelist or blacklist)
  • Attribute value matching a given regex
  • Numeric attribute limited to a certain range
  • Default value used by the UI with control enabled

Managed cluster attributes

Cluster policies support all cluster attributes controlled with the Clusters API. The specific type of restrictions supported may vary per field (based on their type and relation to the cluster form UI elements).

In addition, cluster policies support the following synthetic attributes:

  • A “max DBU-hour” metric, which is the maximum DBUs a cluster can use on an hourly basis. This metric is a direct way to control cost at the individual cluster level.
  • A limit on the source that creates the cluster: Jobs service (job clusters), Clusters UI, Clusters REST API (all-purpose clusters).

Unmanaged cluster attributes

The following cluster attributes cannot be restricted in a cluster policy:

  • Libraries, which are handled by Libraries API. A workaround is to use a custom container or an init script.
  • Number of clusters created per user (either total nor simultaneously). The scope of a policy is a single cluster, so there is no knowledge of the clusters created by a user.
  • Cluster permissions (ACLs), which is handled by a separate API.

Define a cluster policy

You define a cluster policy in a JSON policy definition, which you add when you create the cluster policy.

Create a cluster policy

You create a cluster policy using the cluster policies UI or the Cluster Policies APIs. To create a cluster policy using the UI:

  1. Click the clusters icon Clusters Icon in the sidebar.

  2. Click the Cluster Policies tab.

    Create policy
  3. Click the Create Policy button.

    Create policy
  4. Name the policy. Policy names are case insensitive.

  5. In the Definition tab, paste a policy definition.

  6. Click Create.

Manage cluster policy permissions

By definition admins have permission to all policies. You can manage cluster policy permissions using the cluster policies UI or the Cluster Policy Permissions API.

Add a cluster policy permission

To add a cluster policy permission using the UI:

  1. Click the clusters icon Clusters Icon in the sidebar.

  2. Click the Cluster Policies tab.

  3. Click the Permissions tab.

  4. In the Name column, select a principal.

    Policy permission principal
  5. In the Permission column, select a permission:

    Policy permission
  6. Click Add.

Delete a cluster policy permission

To delete a cluster policy permission using the UI:

  1. Click the clusters icon Clusters Icon in the sidebar.
  2. Click the Cluster Policies tab.
  3. Click the Permissions tab.
  4. Click the Delete Icon icon in the permission row.

Edit a cluster policy using the UI

You edit a cluster policy using the cluster policies UI or the Cluster Policies APIs. To edit a cluster policy using the UI:

  1. Click the clusters icon Clusters Icon in the sidebar.

  2. Click the Cluster Policies tab.

    Create policy
  3. Click a policy name.

  4. Click Edit.

  5. In the Definition tab, edit the policy definition.

  6. Click Update.

Delete a cluster policy using the UI

You delete a cluster policy using the cluster policies UI or the Cluster Policies APIs. To delete a cluster policy using the UI:

  1. Click the clusters icon Clusters Icon in the sidebar.

  2. Click the Cluster Policies tab.

    Create policy
  3. Click a policy name.

  4. Click Delete.

  5. Click Delete to confirm.

Cluster policy definitions

A cluster policy definition is a JSON document that consists of a collection of policy definitions.

Policy definitions

A policy definition is a map between a path string defining an attribute and a limit type. There can only be one limitation per attribute. A path is specific to the type of resource and should reflect the resource creation API attribute name. If the resource creation uses nested attributes, the path should concatenate the nested attribute names with dots. Any attribute not specified by the policy is unlimited.

interface Policy {
  [path: string]: PolicyElement
}

Policy elements

A policy element specifies one of the supported limit types on a given attribute and optionally a default value. You can specify a default value even if there is no limit on the attribute in the policy.

type PolicyElement = FixedPolicy | ForbiddenPolicy | (LimitingPolicyBase & LimitingPolicy);
type LimitingPolicy = WhitelistPolicy | BlacklistPolicy | RegexPolicy | RangePolicy | UnlimitedPolicy;

This section describes the policy types:

Fixed policy

Limit the value to the specified value. For attribute values other than numeric and boolean, the value of the attribute must be represented by or convertible to a string. Optionally the attribute can be hidden in the UI when the hidden flag is present and set to true. A fixed policy cannot specify a default value.

interface FixedPolicy {
    type: "fixed";
    value: string | number | boolean;
    hidden?: boolean;
}
Example
{
  "spark_version": { "type": "fixed", "value": "6.2", "hidden": true }
}

Forbidden policy

For an optional attribute, prevent use of the attribute.

interface ForbiddenPolicy {
    type: "forbidden";
}
Example
{
  "instance_pool_id": { "type": "forbidden" }
}

Limiting policies: common fields

In a limiting policy you can specify two additional fields:

  • defaultValue - the value that populates the cluster creation form in the UI.
  • isOptional - a limiting policy on an attribute makes it required. To make the attribute optional, set the isOptional field to true.
interface LimitedPolicyBase {
    defaultValue?: string | number | boolean;
    isOptional?: boolean;
}
Example
{
  "instance_pool_id": { "type": "unlimited", "isOptional": true, "defaultValue": "id1" }
}

This example policy specifies the default value id1 to the Pool field, but makes it optional.

Whitelist policy

A list of allowed values.

interface WhitelistPolicy {
  type: "whitelist";
  values: (string | number | boolean)[];
}
Example
{
  "spark_version":  { "type": "whitelist", "values": [ "6.2", "6.3" ] }
}

Blacklist policy

The list of disallowed values. Since the values must be exact matches, this policy may not work as expected when the attribute is lenient in how the value is represented (for example allowing leading and trailing whitespaces).

interface BlacklistPolicy {
  type: "blacklist";
  values: (string | number | boolean)[];
}
Example
{
  "spark_version":  { "type": "blacklist", "values": [ "4.0" ] }
}

Regex policy

Limits the value to the ones matching the regex. For safety, when matching the regex is always anchored to the beginning and end of the string value.

interface RegexPolicy {
  type: "regex";
  pattern: string;
}
Example
{
  "spark_version":  { "type": "regex", "value": "5\\.[3456].*" }
}

Range policy

Limits the value to the range specified by the minValue and maxValue attributes. The value must be a decimal number. The numeric limits must be representable as a double floating point value. To indicate lack of a specific limit, you can omit one of minValue, maxValue.

interface RangePolicy {
  type: "range";
  minValue?: number;
  maxValue?: number;
}
Example
{
  "num_workers":  { "type": "range", "maxValue": 10 }
}

Unlimited policy

Does not define value limits. You can use this policy type to make attributes required or to set the default value in the UI.

interface UnlimitedPolicy {
  type: "unlimited";
}
Example

To require adding the COST_BUCKET tag:

{
  "custom_tags.COST_BUCKET":  { "type": "unlimited" }
}

To set default a value for a Spark configuration variable, but also allow omitting (removing) it:

{
  "spark_conf.spark.my.conf":  { "type": "unlimited", "isOptional": true, "defaultValue": "my_value" }
}

Cluster policy attribute paths

The following table lists the supported cluster policy attribute paths.

Attribute path Type Description
cluster_name string The cluster name.
spark_conf.* optional string Control specific configuration values by appending the configuration key name. For example, spark_conf.spark.executor.memory.
instance_pool_id string When hidden, removes pool selection from the UI.
num_workers optional number When hidden, removes the worker number specification from the UI.
autoscale.min_workers optional number When hidden, removes the minimum worker number field from the UI.
autoscale.max_workers optional number When hidden, removes the maximum worker number field from the UI.
autotermination_minutes number Value of 0 represents no auto termination. When hidden, removes auto termination checkbox and value input from the UI.
node_type_id string When hidden, removes the worker node type selection from the UI.
driver_node_type_id optional string When hidden, removes the driver node type selection from the UI.
enable_elastic_disk boolean When hidden, removes the Enable autoscaling local storage checkbox from the UI.
custom_tags.* string Control specific tag values by appending the tag name. For example, custom_tags.<mytag>.
spark_version string Spark image version name (as specified via API).
docker_image.url string Controls Databricks Container Services image URL. When hidden, removes the Databricks Container Services section from the UI.
docker_image.basic_auth.username string Username for the Databricks Container Services image basic authentication.
docker_image.basic_auth.password string Password for the Databricks Container Services image basic authentication.
aws_attributes.availability string Controls AWS availiability (SPOT, ON_DEMAND, or SPOT_WITH_FALLBACK)
aws_attributes.zone_id string Controls AWS zone ID
aws_attributes.first_on_demand number Controls number of nodes to put on on-demand instances.
aws_attributes.instance_profile_arn string Controls AWS instance profile.
aws_attributes.ebs_volume_type string Type of AWS EBS volumes.
aws_attributes.ebs_volume_count number Number of AWS EBS volumes.
aws_attributes.ebs_volume_size number Size (in GiB) of AWS EBS volumes.
aws_attributes.spot_bid_price_percent number Controls the max price for AWS spot instances.
cluster_log_conf.type S3, DBFS, or NONE Type of log destination.
cluster_log_conf.path string Destination URL of the log files.
cluster_log_conf.region string Region for the S3 location.
single_user_name string User name for credential passthrough single user access.
ssh_public_keys.* string * refers to the index of the public key in the attribute array, see Array attributes.
init_stripts.*.s3.destination init_scripts.*.dbfs.destination init_scripts.*.file.destination init_scripts.*.s3.region. string * refers to the index of the init script in the attribute array, see Array attributes.

Cluster policy virtual attribute paths

Attribute path Type Description
dbus_per_hour number Calculated attribute representing (maximum, in case of autoscaling clusters) DBU cost of the cluster including the driver node. For use with range limitation.
cluster_type string

Represents the type of cluster that can be created:

  • all-purpose for Databricks all-purpose clusters
  • job for job clusters created by the job scheduler

Whitelist or blacklist certain types of clusters to be created from the policy. If the all-purpose value is not allowed, the policy is not shown in the all-purpose cluster creation form. If the job value is not allowed, the policy is not shown in the job new cluster form.

Array attributes

You can specify policies for array attributes in two ways:

  • Generic limitations for all array elements. These limitations use the * wildcard symbol in the policy path.
  • Specific limitations for an array element at a specific index. These limitation use a number in the path.

For example, for the array attribute ssh_public_keys, the generic path is ssh_public_keys.* and the specific paths have the form ssh_public_keys.<n>, where <n> is an integer index in the array (starting with 0). You can combine generic and specific limitations, in which case the generic limitation applies to each array element that does not have a specific limitation. In each case only one policy limitation will apply.

Typical use cases for the array policies are:

  • Require inclusion-specific entries. For example:

    {
      "ssh_public_keys.0": {
        "type": "fixed",
        "value": "<required-key-1>"
      },
      "ssh_public_keys.1": {
        "type": "fixed",
        "value": "<required-key-2>"
      }
    }
    

    You cannot require specific keys without specifying the order.

    • Require a fixed value of the entire list. For example:

      {
        "ssh_public_keys.0": {
          "type": "fixed",
          "value": "<required-key-1>"
        },
        "ssh_public_keys.*": {
          "type": "forbidden"
        }
      }
      
    • Disallow the use altogether.

      {
        "ssh_public_keys.*": {
          "type": "forbidden"
        }
      }
      
  • Allow any number of entries but only following a specific restriction. For example:

    {
      "ssh_public_keys.*": {
        "type": "regex",
        "pattern": ".*<required-content>.*"
      }
    }
    

In case of init_scripts paths, the array contains structures for which all elements may need to be handled depending on the use case. For example, to require a specific set of init scripts, you can use the following pattern:

{
  "init_scripts.0.s3.destination": {
    "type": "fixed",
    "value": "s3://<s3-path>"
  },
  "init_scripts.0.s3.region": {
    "type": "fixed",
    "value": "<s3-region>"
  },
  "init_scripts.1.dbfs.destination": {
    "type": "fixed",
    "value": "dbfs://<dbfs-path>"
  },
  "init_scripts.*.s3.destination": {
    "type": "forbidden"
  },
  "init_scripts.*.dbfs.destination": {
    "type": "forbidden"
  },
  "init_scripts.*.file.destination": {
    "type": "forbidden"
  },
}

Cluster policy examples

General cluster policy

A general purpose cluster policy meant to guide users and restrict some functionality, while requiring tags, restricting the maximum number of instances, and enforcing timeout.

{
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "fixed",
    "value": "serverless",
    "hidden": true
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_version": {
    "type": "regex",
    "pattern": "6\\.[0-9]+\\.x-scala.*"
  },
  "node_type_id": {
    "type": "whitelist",
    "values": [
      "i3.xlarge",
      "i3.2xlarge",
      "i3.4xlarge"
    ],
    "defaultValue": "i3.2xlarge"
  },
  "driver_node_type_id": {
    "type": "fixed",
    "value": "i3.2xlarge",
    "hidden": true
  },
  "autoscale.min_workers": {
    "type": "fixed",
    "value": 1,
    "hidden": true
  },
  "autoscale.max_workers": {
    "type": "range",
    "maxValue": 25,
    "defaultValue": 5
  },
  "enable_elastic_disk": {
    "type": "fixed",
    "value": "true",
    "hidden": true
  },
  "autotermination_minutes": {
    "type": "fixed",
    "value": 30,
    "hidden": true
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "product"
  }
}

Simple medium-sized policy

Allows users to create a medium-sized cluster with minimal configuration. The only required field at creation time is cluster name; the rest is fixed and hidden.

{
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "forbidden",
    "hidden": true
  },
  "autoscale.min_workers": {
    "type": "fixed",
    "value": 1,
    "hidden": true
  },
  "autoscale.max_workers": {
    "type": "fixed",
    "value": 10,
    "hidden": true
  },
  "autotermination_minutes": {
    "type": "fixed",
    "value": 60,
    "hidden": true
  },
  "node_type_id": {
    "type": "fixed",
    "value": "i3.xlarge",
    "hidden": true
  },
  "driver_node_type_id": {
    "type": "fixed",
    "value": "i3.xlarge",
    "hidden": true
  },
  "spark_version": {
    "type": "fixed",
    "value": "7.x-scala2.11",
    "hidden": true
  },
  "enable_elastic_disk": {
    "type": "fixed",
    "value": false,
    "hidden": true
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "product"
  }
}

Job-only policy

Allows users to create job clusters and run jobs using the cluster. Users cannot create an all-purpose cluster using this policy.

{
  "cluster_type": {
    "type": "fixed",
    "value": "job"
  },
  "dbus_per_hour": {
    "type": "range",
    "maxValue": 100
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": "true"
  },
  "num_workers": {
    "type": "range",
    "minValue": 1
  },
  "node_type_id": {
    "type": "regex",
    "pattern": "[rmci][3-5][rnad]*.[0-8]{0,1}xlarge"
  },
  "driver_node_type_id": {
    "type": "regex",
    "pattern": "[rmci][3-5][rnad]*.[0-8]{0,1}xlarge"
  },
  "spark_version": {
    "type": "regex",
    "pattern": "6\\.[0-9]+\\.x-scala.*"
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "product"
  }
}

Single Node policy

Allows users to create a Single Node cluster with no workers and Spark enabled. You can find example Single Node policies at Single Node cluster policy.

High Concurrency passthrough policy

Allows users to create clusters in High Concurrency mode that have passthrough enabled by default. This simplifies setup for the admin, since users would need to set the appropriate Spark parameters manually.

{
  "spark_conf.spark.databricks.passthrough.enabled": {
    "type": "fixed",
    "value": "true"
  },
  "spark_conf.spark.databricks.repl.allowedLanguages": {
    "type": "fixed",
    "value": "python,sql"
  },
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "fixed",
    "value": "serverless"
  },
  "spark_conf.spark.databricks.pyspark.enableProcessIsolation": {
    "type": "fixed",
    "value": "true"
  },
  "custom_tags.ResourceClass": {
    "type": "fixed",
    "value": "Serverless"
  }
}

External metastore policy

Allows users to create a cluster with an admin-defined metastore already attached. This is useful to allow users to create their own clusters without requiring additional configuration.

{
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionURL": {
      "type": "fixed",
      "value": "jdbc:sqlserver://<jdbc-url>"
  },
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionDriverName": {
      "type": "fixed",
      "value": "com.microsoft.sqlserver.jdbc.SQLServerDriver"
  },
  "spark_conf.spark.databricks.delta.preview.enabled": {
      "type": "fixed",
      "value": "true"
  },
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionUserName": {
      "type": "fixed",
      "value": "<metastore-user>"
  },
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionPassword": {
      "type": "fixed",
      "value": "<metastore-password>"
  }
}