Instance Pools API

Preview

This feature is in Public Preview.

The Instance Pools API allows you to create, edit, delete and list instance pools.

An instance pool reduces cluster start and auto-scaling times by maintaining a set of idle, ready-to-use cloud instances. When a cluster attached to a pool needs an instance, it first attempts to allocate one of the pool’s idle instances. If the pool has no idle instances, it expands by allocating a new instance from the instance provider in order to accommodate the cluster’s request. When a cluster releases an instance, it returns to the pool and is free for another cluster to use. Only clusters attached to a pool can use that pool’s idle instances.

Databricks does not charge DBUs while instances are idle in the pool. Instance provider billing does apply; see pricing.

Important

In order to use the instance pools feature, you must add new instance profile and tag permissions to the IAM role or keys used to create your account. In particular, you must add the permissions ec2:AssociateIamInstanceProfile, ec2:DescribeIamInstanceProfileAssociations, ec2:DisassociateIamInstanceProfile, ec2:ReplaceIamInstanceProfileAssociation and ec2:DeleteTags. For the complete list of permissions and instructions on how to update your existing IAM role or keys, see AWS Account.


Create

Endpoint HTTP Method
2.0/instance-pools/create POST

Create an instance pool. Use the returned instance_pool_id to query the status of the instance pool, which includes the number of instances currently allocated by the pool. If you provide the min_idle_instances parameter, instances are provisioned in the background and are ready to use once the idle_count in the InstancePoolStats equals the requested minimum.

Note

Databricks may not be able to acquire some of the requested idle instances due to instance provider limitations (account limits, spot price, and so on) or transient network issues. Clusters can still attach to the instance pool, but may not start as quickly.

An example request:

{
  "instance_pool_name": "my-pool",
  "node_type_id": "i3.xlarge",
  "min_idle_instances": 10,
  "aws_attributes": {
    "availability": "SPOT"
  }
}

And response:

{
  "instance_pool_id": "0101-120000-brick1-pool-ABCD1234"
}

Request Structure

Field Name Type Description
instance_pool_name STRING The name of the instance pool. This is required for create and edit operations. It must be unique, non-empty, and less than 100 characters.
min_idle_instances INT32 The minimum number of idle instances maintained by the pool. This is in addition to any instances in use by active clusters.
max_capacity INT32 The maximum number of instances the pool can contain, including both idle instances and ones in use by clusters. Once the maximum capacity is reached, you cannot create new clusters from the pool and existing clusters cannot autoscale up until some instances are made idle in the pool via cluster termination or down-scaling.
aws_attributes InstancePoolAwsAttributes Attributes related to instance pools running on Amazon Web Services. If not specified at creation time, a set of default values is used.
node_type_id STRING The node type for the instances in the pool. All clusters attached to the pool inherit this node type and the pool’s idle instances are allocated based on this type. You can retrieve a list of available node types by using the List Node Types API call.
custom_tags An array of ClusterTag

Additional tags for instance pool resources. Databricks tags all pool resources (e.g. AWS instances and EBS volumes) with these tags in addition to default_tags.

Databricks allows at most 43 custom tags.

idle_instance_autotermination_minutes INT32 The number of minutes that idle instances in excess of the min_idle_instances are maintained by the pool before being terminated. If not specified, excess idle instances are terminated automatically after a default timeout period. If specified, the time must be between 0 and 10000 minutes. If you specify 0, excess idle instances are removed as soon as possible.
enable_elastic_disk BOOL Autoscaling Local Storage: when enabled, the instances in the pool dynamically acquire additional disk space when they are running low on disk space.
disk_spec DiskSpec Defines the amount of initial remote storage attached to each instance in the pool.
preloaded_spark_versions An array of STRING A list of Spark image versions the pool installs on each instance. Pool clusters that use one of the preloaded Spark version start faster as they do have to wait for the Spark image to download. You can retrieve a list of available Spark versions by using the Spark Versions API call.

Response Structure

Field Name Type Description
instance_pool_id STRING The ID of the created instance pool.

Edit

Endpoint HTTP Method
2.0/instance-pools/edit POST

Edit an instance pool. This modifies the configuration of an existing instance pool.

Note

  • You can edit only the following fields: instance_pool_name, min_idle_instances, max_capacity, and idle_instance_autotermination_minutes.
  • You must supply an instance_pool_name.
  • You must supply a node_type_id and it must match the original node_type_id.

An example request:

{
  "instance_pool_id": "0101-120000-brick1-pool-ABCD1234",
  "instance_pool_name": "my-edited-pool",
  "node_type_id": "i3.xlarge",
  "min_idle_instances": 5,
  "max_capacity": 200,
  "idle_instance_autotermination_minutes": 30
}

Request Structure

Field Name Type Description
instance_pool_id STRING The ID of the instance pool to edit. This field is required.
instance_pool_name STRING The name of the instance pool. This is required for create and edit operations. It must be unique, non-empty, and less than 100 characters.
min_idle_instances INT32 The minimum number of idle instances maintained by the pool. This is in addition to any instances in use by active clusters.
max_capacity INT32 The maximum number of instances the pool can contain, including both idle instances and ones in use by clusters. Once the maximum capacity is reached, you cannot create new clusters from the pool and existing clusters cannot autoscale up until some instances are made idle in the pool via cluster termination or down-scaling.
node_type_id STRING The node type for the instances in the pool. All clusters attached to the pool inherit this node type and the pool’s idle instances are allocated based on this type. You can retrieve a list of available node types by using the List Node Types API call.
idle_instance_autotermination_minutes INT32 The number of minutes that idle instances in excess of the min_idle_instances are maintained by the pool before being terminated. If not specified, excess idle instances are terminated automatically after a default timeout period. If specified, the time must be between 0 and 10000 minutes. If 0 is supplied, excess idle instances are removed as soon as possible.

Delete

Endpoint HTTP Method
2.0/instance-pools/delete POST

Delete an instance pool. This permanently deletes the instance pool. The idle instances in the pool are terminated asynchronously. New clusters cannot attach to the pool. Running clusters attached to the pool continue to run but cannot autoscale up. Terminated clusters attached to the pool will fail to start until they are edited to no longer use the pool.

An example request:

{
  "instance_pool_id": "0101-120000-brick1-pool-ABCD1234"
}

Request Structure

Field Name Type Description
instance_pool_id STRING The ID of the instance pool to delete.

Get

Endpoint HTTP Method
2.0/instance-pools/get GET

Retrieve the information for an instance pool given its identifier.

An example request:

/instance-pools/get?instance_pool_id=0101-120000-brick1-pool-ABCD1234

An example response:

{
  "instance_pool_name": "my-pool",
  "min_idle_instances": 10,
  "node_type_id": "i3.xlarge",
  "idle_instance_autotermination_minutes": 60,
  "instance_pool_id": "0101-120000-brick1-pool-ABCD1234",
  "default_tags": [
    { "DatabricksInstancePoolCreatorId", "1234" },
    { "DatabricksInstancePoolId", "0101-120000-brick1-pool-ABCD1234" }
  ],
  "aws_attributes": {
    "availability": "SPOT",
    "spot_bid_price_percent": 100,
    "zone_id": "us-west-2a",
  },
  "stats": {
    "used_count": 10,
    "idle_count": 5,
    "pending_used_count": 5,
    "pending_idle_count": 5
  }
}

Request Structure

Field Name Type Description
instance_pool_id STRING The instance pool about which to retrieve information.

Response Structure

Field Name Type Description
instance_pool_name STRING The name of the instance pool. This is required for create and edit operations. It must be unique, non-empty, and less than 100 characters.
min_idle_instances INT32 The minimum number of idle instances maintained by the pool. This is in addition to any instances in use by active clusters.
max_capacity INT32 The maximum number of instances the pool can contain, including both idle instances and ones in use by clusters. Once the maximum capacity is reached, you cannot create new clusters from the pool and existing clusters cannot autoscale up until some instances are made idle in the pool via cluster termination or down-scaling.
aws_attributes InstancePoolAwsAttributes Attributes related to instance pools running on Amazon Web Services. If not specified at creation time, a set of default values is used.
node_type_id STRING The node type for the instances in the pool. All clusters attached to the pool inherit this node type and the pool’s idle instances are allocated based on this type. You can retrieve a list of available node types by using the List Node Types API call.
custom_tags An array of ClusterTag

Additional tags for instance pool resources. Databricks tags all pool resources (e.g. AWS instances and EBS volumes) with these tags in addition to default_tags.

Databricks allows at most 43 custom tags.

idle_instance_autotermination_minutes INT32 The number of minutes that idle instances in excess of the min_idle_instances are maintained by the pool before being terminated. If not specified, excess idle instances are terminated automatically after a default timeout period. If specified, the time must be between 0 and 10000 minutes. If 0 is supplied, excess idle instances are removed as soon as possible.
enable_elastic_disk BOOL Autoscaling Local Storage: when enabled, the instances in the pool dynamically acquire additional disk space when they are running low on disk space.
disk_spec DiskSpec Defines the amount of initial remote storage attached to each instance in the pool.
preloaded_spark_versions An array of STRING A list of Spark image versions the pool installs on each instance. Pool clusters that use one of the preloaded Spark version start faster as they do have to wait for the Spark image to download. You can retrieve a list of available Spark versions by using the Spark Versions API call.
instance_pool_id STRING The canonical unique identifier for the instance pool.
default_tags An array of ClusterTag

Tags that are added by Databricks regardless of any custom_tags, including:

  • Vendor: Databricks
  • DatabricksInstancePoolCreatorId: <create_user_id>
  • DatabricksInstancePoolId: <instance_pool_id>
state InstancePoolState Current state of the instance pool.
stats InstancePoolStats Statistics about the usage of the instance pool.

List

Endpoint HTTP Method
2.0/instance-pools/list GET

List information for all instance pools.

An example response:

{
  "instance_pools": [
    {
      "instance_pool_name": "my-pool",
      "min_idle_instances": 10,
      "node_type_id": "i3.xlarge",
      "idle_instance_autotermination_minutes": 60,
      "instance_pool_id": "0101-120000-brick1-pool-ABCD1234",
      "default_tags": [
        { "DatabricksInstancePoolCreatorId", "1234" },
        { "DatabricksInstancePoolId", "0101-120000-brick1-pool-ABCD1234" }
      ],
      "aws_attributes": {
        "availability": "SPOT",
        "spot_bid_price_percent": 100,
        "zone_id": "us-west-2a",
      },
      "stats": {
        "used_count": 10,
        "idle_count": 5,
        "pending_used_count": 5,
        "pending_idle_count": 5
      }
    }
  ]
}

Response Structure

Field Name Type Description
instance_pools An array of InstancePoolAndStats A list of instance pools with their statistics included.

Data Structures

InstancePoolState

The state of an instance pool. The current allowable state transitions are as follows:

  • ACTIVE -> DELETED
Name Description
ACTIVE Indicates an instance pool is active. Clusters can attach to it.
DELETED Indicates the instance pool has been deleted and is no longer accessible.

InstancePoolStats

Statistics about the usage of the instance pool.

Field Name Type Description
used_count INT32 Number of active instances that are in use by a cluster.
idle_count INT32 Number of active instances that are not in use by a cluster.
pending_used_count INT32 Number of pending instances that are assigned to a cluster.
pending_idle_count INT32 Number of pending instances that are not assigned to a cluster.

DiskSpec

Describes the initial set of disks to attach to each instance. For example, if there are 3 instances and each instance is configured to start with 2 disks, 100 GiB each, then Databricks creates a total of 6 disks, 100 GiB each, for these instances.

Field Name Type Description
disk_type DiskType The type of disks to attach.
disk_count INT32 The number of disks to attach to each instance: - This feature is only enabled for supported node types. - Users can choose up to the limit of the disks supported by the node type. - For node types with no local disk, at least one disk needs to be specified.
disk_size INT32

The size of each disk (in GiB) to attach. Values must fall into the supported range for a particular instance type:

  • General Purpose SSD: 100 - 4096 GiB
  • Throughput Optimized HDD: 500 - 4096 GiB

DiskType

Describes the type of disk.

Field Name Type Description
ebs_volume_type EbsVolumeType The EBS volume type to use.

InstancePoolAndStats

Field Name Type Description
instance_pool_name STRING The name of the instance pool. This is required for create and edit operations. It must be unique, non-empty, and less than 100 characters.
min_idle_instances INT32 The minimum number of idle instances maintained by the pool. This is in addition to any instances in use by active clusters.
max_capacity INT32 The maximum number of instances the pool can contain, including both idle instances and ones in use by clusters. Once the maximum capacity is reached, you cannot create new clusters from the pool and existing clusters cannot autoscale up until some instances are made idle in the pool via cluster termination or down-scaling.
aws_attributes InstancePoolAwsAttributes Attributes related to instance pools running on Amazon Web Services. If not specified at creation time, a set of default values is used.
node_type_id STRING The node type for the instances in the pool. All clusters attached to the pool inherit this node type and the pool’s idle instances are allocated based on this type. You can retrieve a list of available node types by using the List Node Types API call.
custom_tags An array of ClusterTag

Additional tags for instance pool resources. Databricks tags all pool resources (e.g. AWS instances and EBS volumes) with these tags in addition to default_tags.

Databricks allows at most 43 custom tags.

idle_instance_autotermination_minutes INT32 The number of minutes that idle instances in excess of the min_idle_instances are maintained by the pool before being terminated. If not specified, excess idle instances are terminated automatically after a default timeout period. If specified, the time must be between 0 and 10000 minutes. If 0 is supplied, excess idle instances are removed as soon as possible.
enable_elastic_disk BOOL Autoscaling Local Storage: when enabled, the instances in the pool dynamically acquire additional disk space when they are running low on disk space.
disk_spec DiskSpec Defines the amount of initial remote storage attached to each instance in the pool.
preloaded_spark_versions An array of STRING A list of Spark image versions the pool installs on each instance. Pool clusters that use one of the preloaded Spark version start faster as they do have to wait for the Spark image to download. You can retrieve a list of available Spark versions by using the Spark Versions API call.
instance_pool_id STRING The canonical unique identifier for the instance pool.
default_tags An array of ClusterTag

Tags that are added by Databricks regardless of any custom_tags, including:

  • Vendor: Databricks
  • DatabricksInstancePoolCreatorId: <create_user_id>
  • DatabricksInstancePoolId: <instance_pool_id>
state InstancePoolState Current state of the instance pool.
stats InstancePoolStats Statistics about the usage of the instance pool.

InstancePoolAwsAttributes

Attributes set during instance pool creation which are related to Amazon Web Services.

Field Name Type Description
availability AwsAvailability Availability type used for all instances in the pool. Only ON_DEMAND and SPOT are supported.
zone_id STRING Identifier for the availability zone/datacenter in which the instance pool resides. This string is of a form like “us-west-2a”. The provided availability zone must be in the same region as the Databricks deployment. For example, “us-west-2a” is not a valid zone ID if the Databricks deployment resides in the “us-east-1” region. This is an optional field. If not specified, a default zone is used. You can find the list of available zones as well as the default value by using the List Zones API.
spot_bid_price_percent INT32 The bid price for AWS spot instances, as a percentage of the corresponding instance type’s on-demand price. For example, if this field is set to 50, and the instance pool needs a new i3.xlarge spot instance, then the bid price is half of the price of on-demand i3.xlarge instances. Similarly, if this field is set to 200, the bid price is twice the price of on-demand i3.xlarge instances. If not specified, the default value is 100. When spot instances are requested for this instance pool, only spot instances whose bid price percentage matches this field are considered. For safety, this field cannot be greater than 10000.

EbsVolumeType

All EBS volume types that Databricks supports. See https://aws.amazon.com/ebs/details/ for details.

Name Description
GENERAL_PURPOSE_SSD Provision extra storage using AWS gp2 EBS volumes.
THROUGHPUT_OPTIMIZED_HDD Provision extra storage using AWS st1 volumes.