Compute policy reference

This article is a reference for compute policy definitions. The articles includes a reference of the available policy attributes and limitation types. There are also sample policies you can reference for common use cases.

What are policy definitions?

Policy definitions are individual policy rules expressed in JSON. A definition can add a rule to any of the attributes controlled with the Clusters API. For example, these definitions set a default autotermination time, forbid users from using pools, and enforce the use of Photon:

{
   "autotermination_minutes" : {
    "type" : "unlimited",
    "defaultValue" : 4320,
    "isOptional" : true
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "runtime_engine": {
    "type": "fixed",
    "value": "PHOTON",
    "hidden": true
  }
}

There can only be one limitation per attribute. An attribute’s path reflects the API attribute name. For nested attributes, the path concatenates the nested attribute names using dots. Attributes that aren’t defined in a policy definition won’t be limited.

Supported attributes

Policies support all attributes controlled with the Clusters API. The type of restrictions you can place on attributes may vary per setting based on their type and relation to the UI elements. You cannot use policies to define compute permissions.

You can also use policies to set the max DBUs per hour and cluster type. See Virtual attribute paths.

The following table lists the supported policy attribute paths:

Attribute path

Type

Description

autoscale.max_workers

optional number

When hidden, removes the maximum worker number field from the UI.

autoscale.min_workers

optional number

When hidden, removes the minimum worker number field from the UI.

autotermination_minutes

number

A value of 0 represents no auto termination. When hidden, removes the auto termination checkbox and value input from the UI.

aws_attributes.availability

string

Controls AWS availiability (SPOT, ON_DEMAND, or SPOT_WITH_FALLBACK)

aws_attributes.ebs_volume_count

number

The number of AWS EBS volumes.

aws_attributes.ebs_volume_size

number

The size (in GiB) of AWS EBS volumes.

aws_attributes.ebs_volume_type

string

The type of AWS EBS volumes.

aws_attributes.first_on_demand

number

Controls the number of nodes to put on on-demand instances.

aws_attributes.instance_profile_arn

string

Controls the AWS instance profile.

aws_attributes.spot_bid_price_percent

number

Controls the maximum price for AWS spot instances.

aws_attributes.zone_id

string

Controls the AWS zone ID.

cluster_log_conf.path

string

The destination URL of the log files.

cluster_log_conf.region

string

The Region for the S3 location.

cluster_log_conf.type

S3, DBFS, or NONE

The type of log destination.

cluster_name

string

The cluster name.

custom_tags.*

string

Controls specific tag values by appending the tag name, for example: custom_tags.<mytag>.

data_security_mode

string

Sets the access mode of the cluster. Unity Catalog requires SINGLE_USER or USER_ISOLATION (shared access mode in the UI). A value of NONE means no security features are enabled.

docker_image.basic_auth.password

string

The password for the Databricks Container Services image basic authentication.

docker_image.basic_auth.username

string

The user name for the Databricks Container Services image basic authentication.

docker_image.url

string

Controls the Databricks Container Services image URL. When hidden, removes the Databricks Container Services section from the UI.

driver_node_type_id

optional string

When hidden, removes the driver node type selection from the UI.

enable_elastic_disk

boolean

When hidden, removes the Enable autoscaling local storage checkbox from the UI.

enable_local_disk_encryption

boolean

Set to true to enable, or false to disable, encrypting disks that are locally attached to the cluster (as specified through the API).

init_scripts.*.workspace.destination init_scripts.*.volumes.destination init_scripts.*.s3.destination init_scripts.*.file.destination init_scripts.*.s3.region

string

* refers to the index of the init script in the attribute array. See Writing policies for array attributes.

instance_pool_id

string

Controls the pool used by worker nodes if driver_instance_pool_id is also defined, or for all cluster nodes otherwise. If you use pools for worker nodes, you must also use pools for the driver node. When hidden, removes pool selection from the UI.

driver_instance_pool_id

string

If specified, configures a different pool for the driver node than for worker nodes. If not specified, inherits instance_pool_id. If you use pools for worker nodes, you must also use pools for the driver node. When hidden, removes driver pool selection from the UI.

node_type_id

string

When hidden, removes the worker node type selection from the UI.

num_workers

optional number

When hidden, removes the worker number specification from the UI.

runtime_engine

string

Determines whether the cluster uses Photon or not. Possible values are PHOTON or STANDARD.

single_user_name

string

Controls which users or groups can be assigned to the compute resource.

spark_conf.*

optional string

Control specific configuration values by appending the configuration key name. For example, spark_conf.spark.executor.memory.

spark_env_vars.*

optional string

Control specific Spark environment variable values by appending the environment variable, for example: spark_env_vars.<environment variable name>.

spark_version

string

The Spark image version name as specified through the API (the Databricks Runtime). You can also use special policy values that dynamically select the Databricks Runtime. See Special policy values for Databricks Runtime selection.

ssh_public_keys.*

string

* refers to the index of the public key in the attribute array. See Writing policies for array attributes.

workload_type.clients.jobs

boolean

Defines whether the compute resource can be used for jobs. See Prevent compute from being used with jobs.

workload_type.clients.notebooks

boolean

Defines whether the compute resource can be used with notebooks. See Prevent compute from being used with jobs.

Virtual attribute paths

This table includes two additional synthetic attributes supported by policies:

Attribute path

Type

Description

dbus_per_hour

number

Calculated attribute representing the maximum DBUs a resource can use on an hourly basis including the driver node. This metric is a direct way to control cost at the individual compute level. Use with range limitation.

cluster_type

string

Represents the type of cluster that can be created:

  • all-purpose for Databricks all-purpose compute

  • job for job compute created by the job scheduler

  • dlt for compute created for Delta Live Tables pipelines

Allow or block specified types of compute to be created from the policy. If the all-purpose value is not allowed, the policy is not shown in the all-purpose create compute UI. If the job value is not allowed, the policy is not shown in the create job compute UI.

Special policy values for Databricks Runtime selection

The spark_version attribute supports special values that dynamically map to a Databricks Runtime version based on the current set of supported Databricks Runtime versions.

The following values can be used in the spark_version attribute:

  • auto:latest: Maps to the latest GA Databricks Runtime version.

  • auto:latest-ml: Maps to the latest Databricks Runtime ML version.

  • auto:latest-lts: Maps to the latest long-term support (LTS) Databricks Runtime version.

  • auto:latest-lts-ml: Maps to the latest LTS Databricks Runtime ML version.

  • auto:prev-major: Maps to the second-latest GA Databricks Runtime version. For example, if auto:latest is 14.2, then auto:prev-major is 13.3.

  • auto:prev-major-ml: Maps to the second-latest GA Databricks Runtime ML version. For example, if auto:latest is 14.2, then auto:prev-major is 13.3.

  • auto:prev-lts: Maps to the second-latest LTS Databricks Runtime version. For example, if auto:latest-lts is 13.3, then auto:prev-lts is 12.2.

  • auto:prev-lts-ml: Maps to the second-latest LTS Databricks Runtime ML version. For example, if auto:latest-lts is 13.3, then auto:prev-lts is 12.2.

Note

Using these values does not make the compute auto-update when a new runtime version is released. A user must explicitly edit the compute for the Databricks Runtime version to change.

Supported policy types

This section includes a reference for each of the available policy types. There are two categories of policy types: fixed policies and limiting policies.

Fixed policies prevent user configuration on an attribute. The two types of fixed policies are:

Limiting policies limit a user’s options for configuring an attribute. Limiting policies also allow you to set default values and make attributes optional. See Additional limiting policy fields.

Your options for limiting policies are:

Fixed policy

Fixed policies limit the attribute to the specified value. For attribute values other than numeric and boolean, the value must be represented by or convertible to a string.

With fixed policies, you can also hide the attribute from the UI by setting the hidden field to true.

interface FixedPolicy {
    type: "fixed";
    value: string | number | boolean;
    hidden?: boolean;
}

This example policy fixes the Databricks Runtime version and hides the field from the user’s UI:

{
  "spark_version": { "type": "fixed", "value": "auto:latest-lts", "hidden": true }
}

Forbidden policy

A forbidden policy prevents users from configuring an attribute. Forbidden policies are only compatible with optional attributes.

interface ForbiddenPolicy {
    type: "forbidden";
}

This policy forbids attaching pools to the compute for worker nodes. Pools are also forbidden for the driver node, because driver_instance_pool_id inherits the policy.

{
  "instance_pool_id": { "type": "forbidden" }
}

Allowlist policy

An allowlist policy specifies a list of values the user can choose between when configuring an attribute.

interface AllowlistPolicy {
  type: "allowlist";
  values: (string | number | boolean)[];
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This allowlist example allows the user to select between two Databricks Runtime versions:

{
  "spark_version":  { "type": "allowlist", "values": [ "13.3.x-scala2.12", "12.2.x-scala2.12" ] }
}

Blocklist policy

The blocklist policy lists disallowed values. Since the values must be exact matches, this policy might not work as expected when the attribute is lenient in how the value is represented (for example, allowing leading and trailing spaces).

interface BlocklistPolicy {
  type: "blocklist";
  values: (string | number | boolean)[];
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This example blocks the user from selecting 7.3.x-scala2.12 as the Databricks Runtime.

{
  "spark_version":  { "type": "blocklist", "values": [ "7.3.x-scala2.12" ] }
}

Regex policy

A regex policy limits the available values to ones that match the regex. For safety, make sure your regex is anchored to the beginning and end of the string value.

interface RegexPolicy {
  type: "regex";
  pattern: string;
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This example limits the Databricks Runtime versions a user can select from:

{
  "spark_version":  { "type": "regex", "pattern": "13\\.[3456].*" }
}

Range policy

A range policy limits the value to a specified range using the minValue and maxValue fields. The value must be a decimal number. The numeric limits must be representable as a double floating point value. To indicate lack of a specific limit, you can omit either minValue or maxValue.

interface RangePolicy {
  type: "range";
  minValue?: number;
  maxValue?: number;
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This example limits the maximum amount of workers to 10:

{
  "num_workers":  { "type": "range", "maxValue": 10 }
}

Unlimited policy

The unlimited policy is used to make attributes required or to set the default value in the UI.

interface UnlimitedPolicy {
  type: "unlimited";
  defaultValue?: string | number | boolean;
  isOptional?: boolean;
}

This example adds the COST_BUCKET tag to the compute:

{
  "custom_tags.COST_BUCKET":  { "type": "unlimited" }
}

To set a default value for a Spark configuration variable, but also allow omitting (removing) it:

{
  "spark_conf.spark.my.conf":  { "type": "unlimited", "isOptional": true, "defaultValue": "my_value" }
}

Additional limiting policy fields

For limiting policy types you can specify two additional fields:

  • defaultValue - The value that automatically populates in the create compute UI.

  • isOptional - A limiting policy on an attribute automatically makes it required. To make the attribute optional, set the isOptional field to true.

Note

Default values don’t automatically get applied to compute created with the Clusters API. To apply default values using the API, add the parameter apply_policy_default_values to the compute definition and set it to true.

This example policy specifies the default value id1 for the pool for worker nodes, but makes it optional. When creating the compute, you can select a different pool or choose not to use one. If driver_instance_pool_id isn’t defined in the policy or when creating the compute, the same pool is used for worker nodes and the driver node.

{
  "instance_pool_id": { "type": "unlimited", "isOptional": true, "defaultValue": "id1" }
}

Writing policies for array attributes

You can specify policies for array attributes in two ways:

  • Generic limitations for all array elements. These limitations use the * wildcard symbol in the policy path.

  • Specific limitations for an array element at a specific index. These limitation use a number in the path.

For example, for the array attribute init_scripts, the generic paths start with init_scripts.* and the specific paths with init_scripts.<n>, where <n> is an integer index in the array (starting with 0). You can combine generic and specific limitations, in which case the generic limitation applies to each array element that does not have a specific limitation. In each case only one policy limitation will apply.

The following sections show examples of common examples that use array attributes.

Require inclusion-specific entries

You cannot require specific values without specifying the order. For example:

{
  "init_scripts.0.volumes.destination": {
    "type": "fixed",
    "value": "<required-script-1>"
  },
  "init_scripts.1.volumes.destination": {
    "type": "fixed",
    "value": "<required-script-2>"
  }
}

Require a fixed value of the entire list

{
  "init_scripts.0.volumes.destination": {
    "type": "fixed",
    "value": "<required-script-1>"
  },
  "init_scripts.*.volumes.destination": {
    "type": "forbidden"
  }
}

Disallow the use altogether

{
   "init_scripts.*.volumes.destination": {
    "type": "forbidden"
  }
}

Allow entries that follow specific restriction

{
    "init_scripts.*.volumes.destination": {
    "type": "regex",
    "pattern": ".*<required-content>.*"
  }
}

Fix a specific set of init scripts

In case of init_scripts paths, the array can contain one of multiple structures for which all possible variants may need to be handled depending on the use case. For example, to require a specific set of init scripts, and disallow any variant of the other version, you can use the following pattern:

{
  "init_scripts.0.volumes.destination": {
    "type": "fixed",
    "value": "<volume-paths>"
  },
  "init_scripts.1.volumes.destination": {
    "type": "fixed",
    "value": "<volume-paths>"
  },
  "init_scripts.*.workspace.destination": {
    "type": "forbidden"
  },
  "init_scripts.*.s3.destination": {
    "type": "forbidden"
  },
  "init_scripts.*.file.destination": {
    "type": "forbidden"
  }
}

Policy examples

This section includes policy examples you can use as references for creating your own policies. You can also use the Databricks-provided policy families as templates for common policy use cases.

General compute policy

A general purpose compute policy meant to guide users and restrict some functionality, while requiring tags, restricting the maximum number of instances, and enforcing timeout.

{
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_version": {
    "type": "unlimited",
    "defaultValue": "auto:latest-ml"
  },
  "node_type_id": {
    "type": "allowlist",
    "values": [
      "i3.xlarge",
      "i3.2xlarge",
      "i3.4xlarge"
    ],
    "defaultValue": "i3.2xlarge"
  },
  "driver_node_type_id": {
    "type": "fixed",
    "value": "i3.2xlarge",
    "hidden": true
  },
  "autoscale.min_workers": {
    "type": "fixed",
    "value": 1,
    "hidden": true
  },
  "autoscale.max_workers": {
    "type": "range",
    "maxValue": 25,
    "defaultValue": 5
  },
  "enable_elastic_disk": {
    "type": "fixed",
    "value": true,
    "hidden": true
  },
  "autotermination_minutes": {
    "type": "fixed",
    "value": 30,
    "hidden": true
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "product"
  }
}

Define limits on Delta Live Tables pipeline compute

Note

When using policies to configure Delta Live Tables compute, Databricks recommends applying a single policy to both the default and maintenance compute.

To configure a policy for a pipeline compute, create a policy with the cluster_type field set to dlt. The following example creates a minimal policy for a Delta Live Tables compute:

{
  "cluster_type": {
    "type": "fixed",
    "value": "dlt"
  },
  "num_workers": {
    "type": "unlimited",
    "defaultValue": 3,
    "isOptional": true
  },
  "node_type_id": {
    "type": "unlimited",
    "isOptional": true
  },
  "spark_version": {
    "type": "unlimited",
    "hidden": true
  }
}

Simple medium-sized policy

Allows users to create a medium-sized compute with minimal configuration. The only required field at creation time is compute name; the rest is fixed and hidden.

{
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "forbidden",
    "hidden": true
  },
  "autoscale.min_workers": {
    "type": "fixed",
    "value": 1,
    "hidden": true
  },
  "autoscale.max_workers": {
    "type": "fixed",
    "value": 10,
    "hidden": true
  },
  "autotermination_minutes": {
    "type": "fixed",
    "value": 60,
    "hidden": true
  },
  "node_type_id": {
    "type": "fixed",
    "value": "i3.xlarge",
    "hidden": true
  },
  "driver_node_type_id": {
    "type": "fixed",
    "value": "i3.xlarge",
    "hidden": true
  },
  "spark_version": {
    "type": "fixed",
    "value": "auto:latest-ml",
    "hidden": true
  },
  "enable_elastic_disk": {
    "type": "fixed",
    "value": false,
    "hidden": true
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "product"
  }
}

Job-only policy

Allows users to create job compute to run jobs. Users cannot create all-purpose compute using this policy.

{
  "cluster_type": {
    "type": "fixed",
    "value": "job"
  },
  "dbus_per_hour": {
    "type": "range",
    "maxValue": 100
  },
  "instance_pool_id": {
    "type": "forbidden",
    "hidden": true
  },
  "num_workers": {
    "type": "range",
    "minValue": 1
  },
  "node_type_id": {
    "type": "regex",
    "pattern": "[rmci][3-5][rnad]*.[0-8]{0,1}xlarge"
  },
  "driver_node_type_id": {
    "type": "regex",
    "pattern": "[rmci][3-5][rnad]*.[0-8]{0,1}xlarge"
  },
  "spark_version": {
    "type": "unlimited",
    "defaultValue": "auto:latest-lts"
  },
  "custom_tags.team": {
    "type": "fixed",
    "value": "product"
  }
}

External metastore policy

Allows users to create compute with an admin-defined metastore already attached. This is useful to allow users to create their own compute without requiring additional configuration.

{
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionURL": {
      "type": "fixed",
      "value": "jdbc:sqlserver://<jdbc-url>"
  },
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionDriverName": {
      "type": "fixed",
      "value": "com.microsoft.sqlserver.jdbc.SQLServerDriver"
  },
  "spark_conf.spark.databricks.delta.preview.enabled": {
      "type": "fixed",
      "value": "true"
  },
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionUserName": {
      "type": "fixed",
      "value": "<metastore-user>"
  },
  "spark_conf.spark.hadoop.javax.jdo.option.ConnectionPassword": {
      "type": "fixed",
      "value": "<metastore-password>"
  }
}

Prevent compute from being used with jobs

This policy prevents users from using the compute to run jobs. Users will only be able to use the compute with notebooks.

{
  "workload_type.clients.notebooks": {
    "type": "fixed",
    "value": true
  },
  "workload_type.clients.jobs": {
    "type": "fixed",
    "value": false
  }
}

Remove autoscaling policy

This policy disables autoscaling and allows the user to set the number of workers within a given range.

{
  "num_workers": {
  "type": "range",
  "maxValue": 25,
  "minValue": 1,
  "defaultValue": 5
  }
}

Custom tag enforcement

To add a compute tag rule to a policy, use the custom_tags.<tag-name> attribute.

For example, any user using this policy needs to fill in a COST_CENTER tag with 9999, 9921, or 9531 for the compute to launch:

   {"custom_tags.COST_CENTER": {"type":"allowlist", "values":["9999", "9921", "9531" ]}}