Clusters API 2.0

Important

This article’s content has been retired and might not be updated. See Clusters in the Databricks REST API Reference.

The Clusters API allows you to create, start, edit, list, terminate, and delete clusters. The maximum allowed size of a request to the Clusters API is 10MB.

Cluster lifecycle methods require a cluster ID, which is returned from Create. To obtain a list of clusters, invoke List.

Databricks maps cluster node instance types to compute units known as DBUs. See the instance type pricing page for a list of the supported instance types and their corresponding DBUs. For instance provider information, see AWS instance type specifications and pricing.

Databricks always provides one year’s deprecation notice before ceasing support for an instance type.

Warning

You should never hard code secrets or store them in plain text. Use the Secrets API 2.0 to manage secrets in the Databricks CLI. Use the Secrets utility (dbutils.secrets) to reference secrets in notebooks and jobs.

Important

To access Databricks REST APIs, you must authenticate.

Create

Endpoint

HTTP Method

2.0/clusters/create

POST

Create a new Apache Spark cluster. This method acquires new instances from the cloud provider if necessary. This method is asynchronous; the returned cluster_id can be used to poll the cluster state. When this method returns, the cluster is in a PENDING state. The cluster is usable once it enters a RUNNING state. See ClusterState.

Note

Databricks might not be able to acquire some of the requested nodes, due to cloud provider limitations or transient network issues. If Databricks acquires at least 85% of the requested on-demand nodes, cluster creation will succeed. Otherwise the cluster will terminate with an informative error message.

Examples

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/create \
--data @create-cluster.json

create-cluster.json:

{
  "cluster_name": "my-cluster",
  "spark_version": "7.3.x-scala2.12",
  "node_type_id": "i3.xlarge",
  "spark_conf": {
    "spark.speculation": true
  },
  "aws_attributes": {
    "availability": "SPOT",
    "zone_id": "us-west-2a"
  },
  "num_workers": 25
}
{ "cluster_id": "1234-567890-cited123" }

Here is an example for an autoscaling cluster. This cluster will start with two nodes, the minimum.

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/create \
--data @create-cluster.json

create-cluster.json:

{
  "cluster_name": "autoscaling-cluster",
  "spark_version": "7.3.x-scala2.12",
  "node_type_id": "i3.xlarge",
  "autoscale" : {
    "min_workers": 2,
    "max_workers": 50
  }
}
{ "cluster_id": "1234-567890-batch123" }

This example creates a Single Node cluster. To create a Single Node cluster:

  • Set spark_conf and custom_tags to the exact values in the example.

  • Set num_workers to 0.

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/create \
--data @create-cluster.json

create-cluster.json:

{
  "cluster_name": "single-node-cluster",
  "spark_version": "7.6.x-scala2.12",
  "node_type_id": "i3.xlarge",
  "num_workers": 0,
  "spark_conf": {
    "spark.databricks.cluster.profile": "singleNode",
    "spark.master": "local[*, 4]"
  },
  "custom_tags": {
    "ResourceClass": "SingleNode"
  }
}
{ "cluster_id": "1234-567890-ruins123" }

This example creates a cluster and mounts an Amazon EFS file system.

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/create \
--data @create-cluster.json

create-cluster.json:

{
  "cluster_name": "efs-cluster",
  "spark_version": "7.6.x-scala2.12",
  "node_type_id": "i3.xlarge",
  "instance_type": "i3.xlarge",
  "cluster_mount_infos":[
    {
      "network_filesystem_info":{
        "server_address":"hostname.efs.us-east-1.amazonaws.com",
        "mount_options": "rsize=1048576,wsize=1048576,hard,timeo=600"
      },
      "remote_mount_dir_path": "/",
      "local_mount_dir_path": "/mnt/efs-mount"
    }
  ],
  "aws_attributes":{
    "availability": "SPOT",
    "zone_id": "us-east-2"
  },
  "num_workers": 25
}
{ "cluster_id": "1234-567890-pouch123" }

To create a job or submit a run with a new cluster using a policy and the policy’s default values, set policy_id to the policy ID and apply_policy_default_values to true:

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/create \
--data @create-cluster.json

create-cluster.json:

{
    "num_workers": null,
    "autoscale": {
        "min_workers": 2,
        "max_workers": 8
    },
    "cluster_name": "my-cluster",
    "spark_version": "7.3.x-scala2.12",
    "spark_conf": {},
    "aws_attributes": {
        "first_on_demand": 1,
        "availability": "SPOT_WITH_FALLBACK",
        "zone_id": "us-west-2a",
        "instance_profile_arn": null,
        "spot_bid_price_percent": 100,
        "ebs_volume_count": 0
    },
    "node_type_id": "i3.xlarge",
    "ssh_public_keys": [],
    "custom_tags": {},
    "spark_env_vars": {
        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
    },
    "autotermination_minutes": 120,
    "init_scripts": [],
    "policy_id": "C65B864F02000008",
    "apply_policy_default_values": true
}
{ "cluster_id": "1234-567890-buyer123" }

To create a new cluster, define the cluster’s properties in new_cluster:

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/jobs/create \
--data @create-job.json

create-job.json:

{
  "run_name": "my spark task",
  "new_cluster": {
    "spark_version": "7.3.x-scala2.12",
    "node_type_id": "r3.xlarge",
    "aws_attributes": {
      "availability": "ON_DEMAND"
    },
    "num_workers": 10,
    "policy_id": "ABCD000000000000"
  },
  "libraries": [
    {
      "jar": "dbfs:/my-jar.jar"
    },
    {
      "maven": {
        "coordinates": "org.jsoup:jsoup:1.7.2"
      }
    }
  ],
  "spark_jar_task": {
    "main_class_name": "com.databricks.ComputeModels"
  }
}
{ "job_id": 244 }

Request structure of the cluster definition

Field Name

Type

Description

num_workers OR autoscale

INT32 OR AutoScale

If num_workers, number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will be updated immediately to reflect the target size of 10 workers, whereas the workers listed in executors will gradually increase from 5 to 10 as the new nodes are provisioned.

If autoscale, parameters needed in order to automatically scale clusters up and down based on load.

cluster_name

STRING

Cluster name requested by the user. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

spark_version

STRING

The runtime version of the cluster. You can retrieve a list of available runtime versions by using the Runtime versions API call. This field is required.

spark_conf

SparkConfPair

An object containing a set of optional, user-specified Spark configuration key-value pairs. You can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively. Example Spark confs: {"spark.speculation": true, "spark.streaming.ui.retainedBatches": 5} or {"spark.driver.extraJavaOptions": "-verbose:gc -XX:+PrintGCDetails"}

aws_attributes

AwsAttributes

Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values is used.

node_type_id

STRING

This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the List node types API call. This field is required.

driver_node_type_id

STRING

The node type of the Spark driver. This field is optional; if unset, the driver node type will be set as the same value as node_type_id defined above.

ssh_public_keys

An array of STRING

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.

custom_tags

ClusterTag

An object containing a set of tags for cluster resources. Databricks tags all cluster resources (such as AWS instances and EBS volumes) with these tags in addition to default_tags.

Note:

  • Tags are not supported on legacy node types such as compute-optimized and memory-optimized.

  • Databricks allows at most 45 custom tags.

  • If the cluster is created on an instance pool, the cluster tags are not copied to the cluster resources. To tag resources for an instance pool, see the custom_tags field in the Instance Pools API 2.0.

cluster_log_conf

ClusterLogConf

The configuration for delivering Spark logs to a long-term storage destination. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every 5 mins. The destination of driver logs is <destination>/<cluster-ID>/driver, while the destination of executor logs is <destination>/<cluster-ID>/executor.

init_scripts

An array of InitScriptInfo

The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-ID>/init_scripts.

docker_image

DockerImage

Docker image for a custom container.

spark_env_vars

SparkEnvPair

An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pairs of the form (X,Y) are exported as is (that is, export X='Y') while launching the driver and workers. In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the following example. This ensures that all default Databricks managed environmental variables are included. Example Spark environment variables: {"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"} or {"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}

autotermination_minutes

INT32

Automatically terminates the cluster after it is inactive for the specified time in minutes. If not specified, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination.

enable_elastic_disk

BOOL

Autoscaling Local Storage: When enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to Autoscaling local storage for details.

driver_instance_pool_id

STRING

The optional ID of the instance pool to use for the driver node. You must also specify instance_pool_id. For details, see Instance Pools API 2.0.

instance_pool_id

STRING

The optional ID of the instance pool to use for cluster nodes. If driver_instance_pool_id is present, instance_pool_id is used for worker nodes only. Otherwise, it is used for both the driver and worker nodes. For details, see Instance Pools API 2.0.

idempotency_token

STRING

An optional token that can be used to guarantee the idempotency of cluster creation requests. If the idempotency token is assigned to a cluster that is not in the TERMINATED state, the request does not create a new cluster but instead returns the ID of the existing cluster. Otherwise, a new cluster is created. The idempotency token is cleared when the cluster is terminated.

If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one cluster will be launched with that idempotency token.

This token should have at most 64 characters.

apply_policy_default_values

BOOL

Whether to use policy default values for missing cluster attributes.

enable_local_disk_encryption

BOOL

Whether encryption of disks locally attached to the cluster is enabled.

runtime_engine

STRING

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:

  • PHOTON: Use the Photon runtime engine type.

  • STANDARD: Use the standard runtime engine type.

This field is optional.

cluster_mount_infos

An array of MountInfo

An object containing optional specifications for a network file system mount.

Response structure

Field Name

Type

Description

cluster_id

STRING

Canonical identifier for the cluster.

Edit

Endpoint

HTTP Method

2.0/clusters/edit

POST

Edit the configuration of a cluster to match the provided attributes and size.

You can edit a cluster if it is in a RUNNING or TERMINATED state. If you edit a cluster while it is in a RUNNING state, it will be restarted so that the new attributes can take effect. If you edit a cluster while it is in a TERMINATED state, it will remain TERMINATED. The next time it is started using the clusters/start API, the new attributes will take effect. An attempt to edit a cluster in any other state will be rejected with an INVALID_STATE error code.

Clusters created by the Databricks Jobs service cannot be edited.

Example

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/edit \
--data @edit-cluster.json

edit-cluster.json:

{
  "cluster_id": "1202-211320-brick1",
  "num_workers": 10,
  "spark_version": "7.3.x-scala2.12",
  "node_type_id": "i3.2xlarge"
}
{}

Request structure

Field Name

Type

Description

num_workers OR autoscale

INT32 OR AutoScale

If num_workers, number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in executors will gradually increase from 5 to 10 as the new nodes are provisioned.

If autoscale, parameters needed in order to automatically scale clusters up and down based on load.

cluster_id

STRING

Canonical identifier for the cluster. This field is required.

cluster_name

STRING

Cluster name requested by the user. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

spark_version

STRING

The runtime version of the cluster. You can retrieve a list of available runtime versions by using the Runtime versions API call. This field is required.

spark_conf

SparkConfPair

An object containing a set of optional, user-specified Spark configuration key-value pairs. You can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.

Example Spark confs: {"spark.speculation": true, "spark.streaming.ui.retainedBatches": 5} or {"spark.driver.extraJavaOptions": "-verbose:gc -XX:+PrintGCDetails"}

aws_attributes

AwsAttributes

Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used.

node_type_id

STRING

This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the List node types API call. This field is required.

driver_node_type_id

STRING

The node type of the Spark driver. This field is optional. If you don’t specify a value, the driver node type will be set to the same value as node_type_id defined above.

ssh_public_keys

An array of STRING

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.

custom_tags

ClusterTag

An object containing a set of tags for cluster resources. Databricks tags all cluster resources (such as AWS instances and EBS volumes) with these tags in addition to default_tags.

Note:

  • Tags are not supported on legacy node types such as compute-optimized and memory-optimized.

  • Databricks allows at most 45 custom tags.

  • If the cluster is created on an instance pool, the cluster tags are not copied to the cluster resources. To tag resources for an instance pool, see the custom_tags field in the Instance Pools API 2.0.

cluster_log_conf

ClusterLogConf

The configuration for delivering Spark logs to a long-term storage destination. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every 5 mins. The destination of driver logs is <destination>/<cluster-ID>/driver, while the destination of executor logs is <destination>/<cluster-ID>/executor.

init_scripts

An array of InitScriptInfo

The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-ID>/init_scripts.

docker_image

DockerImage

Docker image for a custom container.

spark_env_vars

SparkEnvPair

An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pairs of the form (X,Y) are exported as is (that is, export X='Y') while launching the driver and workers.

In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the following example. This ensures that all default Databricks managed environmental variables are included.

Example Spark environment variables: {"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"} or {"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}

autotermination_minutes

INT32

Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination.

enable_elastic_disk

BOOL

Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to Autoscaling local storage for details.

instance_pool_id

STRING

The optional ID of the instance pool to which the cluster belongs. Refer to Create a pool for details.

apply_policy_default_values

BOOL

Whether to use policy default values for missing cluster attributes.

enable_local_disk_encryption

BOOL

Whether encryption of disks locally attached to the cluster is enabled.

runtime_engine

STRING

The type of runtime engine to use. If not specified, the runtime engine type is inferred based on the spark_version value. Allowed values include:

  • PHOTON: Use the Photon runtime engine type.

  • STANDARD: Use the standard runtime engine type.

This field is optional.

cluster_mount_infos

An array of MountInfo

An object containing optional specifications for a network file system mount.

Change owner

Endpoint

HTTP Method

2.0/clusters/change-owner

POST

Changes a cluster’s owner. The new owner must be an admin.

Example

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/change-owner \
--data '{ "cluster_id": "1234-567890-reef123", "owner_username": "someone@example.com" }'
{}

Request structure

Field Name

Type

Description

cluster_id

STRING

The cluster whose owner you want to change. This field is required.

owner_username

STRING

The username of the cluster’s new owner. This field is required.

Response structure

If the request succeeds, an empty response is returned.

Response errors

Error

Description

INVALID_PARAMETER_VALUE

The cluster ID is not valid.

PERMISSION_DENIED

The new owner is not an administrator.

RESOURCE_DOES_NOT_EXIST

The username is not valid.

Other errors return BAD_REQUEST.

Start

Endpoint

HTTP Method

2.0/clusters/start

POST

Start a terminated cluster given its ID. This is similar to createCluster, except:

  • The terminated cluster ID and attributes are preserved.

  • The cluster starts with the last specified cluster size. If the terminated cluster is an autoscaling cluster, the cluster starts with the minimum number of nodes.

  • If the cluster is in the RESTARTING state, a 400 error is returned.

  • You cannot start a cluster launched to run a job.

Example

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/start \
--data '{ "cluster_id": "1234-567890-reef123" }'
{}

Request structure

Field Name

Type

Description

cluster_id

STRING

The cluster to be started. This field is required.

Restart

Endpoint

HTTP Method

2.0/clusters/restart

POST

Restart a cluster given its ID. The cluster must be in the RUNNING state.

Example

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/restart \
--data '{ "cluster_id": "1234-567890-reef123" }'
{}

Request structure

Field Name

Type

Description

cluster_id

STRING

The cluster to be started. This field is required.

Resize

Endpoint

HTTP Method

2.0/clusters/resize

POST

Resize a cluster to have a desired number of workers. The cluster must be in the RUNNING state.

Example

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/resize \
--data '{ "cluster_id": "1234-567890-reef123", "num_workers": 30 }'
{}

Request structure

Field Name

Type

Description

num_workers OR autoscale

INT32 OR AutoScale

If num_workers, number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in executors will gradually increase from 5 to 10 as the new nodes are provisioned.

If autoscale, parameters needed in order to automatically scale clusters up and down based on load.

cluster_id

STRING

The cluster to be resized. This field is required.

Delete (terminate)

Endpoint

HTTP Method

2.0/clusters/delete

POST

Terminate a cluster given its ID. The cluster is removed asynchronously. Once the termination has completed, the cluster will be in the TERMINATED state. If the cluster is already in a TERMINATING or TERMINATED state, nothing will happen.

Unless a cluster is pinned, 30 days after the cluster is terminated, it is permanently deleted.

Example

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/delete \
--data '{ "cluster_id": "1234-567890-frays123" }'
{}

Request structure

Field Name

Type

Description

cluster_id

STRING

The cluster to be terminated. This field is required.

Permanent delete

Endpoint

HTTP Method

2.0/clusters/permanent-delete

POST

Permanently delete a cluster. If the cluster is running, it is terminated and its resources are asynchronously removed. If the cluster is terminated, then it is immediately removed.

You cannot perform any action, including retrieve the cluster’s permissions, on a permanently deleted cluster. A permanently deleted cluster is also no longer returned in the cluster list.

Example

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/permanent-delete \
--data '{ "cluster_id": "1234-567890-frays123" }'
{}

Request structure

Field Name

Type

Description

cluster_id

STRING

The cluster to be permanently deleted. This field is required.

Get

Endpoint

HTTP Method

2.0/clusters/get

GET

Retrieve the information for a cluster given its identifier. Clusters can be described while they are running or up to 30 days after they are terminated.

Example

curl --netrc -X GET \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/get \
--data '{ "cluster_id": "1234-567890-reef123" }' \
| jq .
{
  "cluster_id": "1234-567890-reef123",
  "spark_context_id": 4020997813441462000,
  "cluster_name": "my-cluster",
  "spark_version": "8.2.x-scala2.12",
  "aws_attributes": {
    "zone_id": "us-west-2c",
    "first_on_demand": 1,
    "availability": "SPOT_WITH_FALLBACK",
    "spot_bid_price_percent": 100,
    "ebs_volume_count": 0
  },
  "node_type_id": "i3.xlarge",
  "driver_node_type_id": "i3.xlarge",
  "autotermination_minutes": 120,
  "enable_elastic_disk": false,
  "disk_spec": {
    "disk_count": 0
  },
  "cluster_source": "UI",
  "enable_local_disk_encryption": false,
  "instance_source": {
    "node_type_id": "i3.xlarge"
  },
  "driver_instance_source": {
    "node_type_id": "i3.xlarge"
  },
  "state": "TERMINATED",
  "state_message": "Inactive cluster terminated (inactive for 120 minutes).",
  "start_time": 1618263108824,
  "terminated_time": 1619746525713,
  "last_state_loss_time": 1619739324740,
  "num_workers": 30,
  "default_tags": {
    "Vendor": "Databricks",
    "Creator": "someone@example.com",
    "ClusterName": "my-cluster",
    "ClusterId": "1234-567890-reef123"
  },
  "creator_user_name": "someone@example.com",
  "termination_reason": {
    "code": "INACTIVITY",
    "parameters": {
      "inactivity_duration_min": "120"
    },
    "type": "SUCCESS"
  },
  "init_scripts_safe_mode": false
}

Request structure

Field Name

Type

Description

cluster_id

STRING

The cluster about which to retrieve information. This field is required.

Response structure

Field Name

Type

Description

num_workers OR autoscale

INT32 OR AutoScale

If num_workers, number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in executors will gradually increase from 5 to 10 as the new nodes are provisioned.

If autoscale, parameters needed in order to automatically scale clusters up and down based on load.

cluster_id

STRING

Canonical identifier for the cluster. This ID is retained during cluster restarts and resizes, while each new cluster has a globally unique ID.

creator_user_name

STRING

Creator user name. The field won’t be included in the response if the user has already been deleted.

driver

SparkNode

Node on which the Spark driver resides. The driver node contains the Spark master and the Databricks application that manages the per-notebook Spark REPLs.

executors

An array of SparkNode

Nodes on which the Spark executors reside.

spark_context_id

INT64

A canonical SparkContext identifier. This value does change when the Spark driver restarts. The pair (cluster_id, spark_context_id) is a globally unique identifier over all Spark contexts.

jdbc_port

INT32

Port on which Spark JDBC server is listening in the driver node. No service will be listening on on this port in executor nodes.

cluster_name

STRING

Cluster name requested by the user. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

spark_version

STRING

The runtime version of the cluster. You can retrieve a list of available runtime versions by using the Runtime versions API call.

spark_conf

SparkConfPair

An object containing a set of optional, user-specified Spark configuration key-value pairs. You can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.

Example Spark confs: {"spark.speculation": true, "spark.streaming.ui.retainedBatches": 5} or {"spark.driver.extraJavaOptions": "-verbose:gc -XX:+PrintGCDetails"}

aws_attributes

AwsAttributes

Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used.

node_type_id

STRING

This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the List node types API call. This field is required.

driver_node_type_id

STRING

The node type of the Spark driver. This field is optional. If you don’t specify a value, the driver node type will be set to the same value as node_type_id defined above.

ssh_public_keys

An array of STRING

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.

custom_tags

ClusterTag

An object containing a set of tags for cluster resources. Databricks tags all cluster resources with these tags in addition to default_tags.

Note:

  • Tags are not supported on legacy node types such as compute-optimized and memory-optimized.

  • Databricks allows at most 45 custom tags.

  • If the cluster is created on an instance pool, the cluster tags are not copied to the cluster resources. To tag resources for an instance pool, see the custom_tags field in the Instance Pools API 2.0.

cluster_log_conf

ClusterLogConf

The configuration for delivering Spark logs to a long-term storage destination. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every 5 mins. The destination of driver logs is <destination>/<cluster-ID>/driver, while the destination of executor logs is <destination>/<cluster-ID>/executor.

init_scripts

An array of InitScriptInfo

The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to that location.

docker_image

DockerImage

Docker image for a custom container.

spark_env_vars

SparkEnvPair

An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pairs of the form (X,Y) are exported as is (that is, export X='Y') while launching the driver and workers.

In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the following example. This ensures that all default Databricks managed environmental variables are included.

Example Spark environment variables: {"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"} or {"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}

autotermination_minutes

INT32

Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination.

enable_elastic_disk

BOOL

Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to Autoscaling local storage for details.

instance_pool_id

STRING

The optional ID of the instance pool to which the cluster belongs. Refer to Create a pool for details.

cluster_source

ClusterSource

Determines whether the cluster was created by a user through the UI, by the Databricks Jobs scheduler, through an API request, or by the Delta Live Tables runtime. Example values include API, UI, or PIPELINE.

state

ClusterState

State of the cluster.

state_message

STRING

A message associated with the most recent state transition (for example, the reason why the cluster entered the TERMINATED state).

start_time

INT64

Time (in epoch milliseconds) when the cluster creation request was received (when the cluster entered the PENDING state).

terminated_time

INT64

Time (in epoch milliseconds) when the cluster was terminated, if applicable.

last_state_loss_time

INT64

Time when the cluster driver last lost its state (due to a restart or driver failure).

last_activity_time

INT64

Time (in epoch milliseconds) when the cluster was last active. A cluster is active if there is at least one command that has not finished on the cluster. This field is available after the cluster has reached the RUNNING state. Updates to this field are made as best-effort attempts. Certain versions of Spark do not support reporting of cluster activity. Refer to Automatic termination for details.

cluster_memory_mb

INT64

Total amount of cluster memory, in megabytes.

cluster_cores

FLOAT

Number of CPU cores available for this cluster. This can be fractional since certain node types are configured to share cores between Spark nodes on the same instance.

default_tags

ClusterTag

An object containing a set of tags that are added by Databricks regardless of any custom_tags, including:

  • Vendor: Databricks

  • Creator: <username-of-creator>

  • ClusterName: <name-of-cluster>

  • ClusterId: <id-of-cluster>

  • Name: <Databricks internal use>

    On job clusters:

  • RunName: <name-of-job>

  • JobId: <id-of-job>

    On resources used by Databricks SQL:

  • SqlWarehouseId: <id-of-warehouse>

cluster_log_status

LogSyncStatus

Cluster log delivery status.

termination_reason

TerminationReason

Information about why the cluster was terminated. This field appears only when the cluster is in the TERMINATING or TERMINATED state.

Pin

Note

You must be a Databricks administrator to invoke this API.

Endpoint

HTTP Method

2.0/clusters/pin

POST

Ensure that an all-purpose cluster configuration is retained even after a cluster has been terminated for more than 30 days. Pinning ensures that the cluster is always returned by the List API. Pinning a cluster that is already pinned has no effect.

Example

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/pin \
--data '{ "cluster_id": "1234-567890-reef123" }'
{}

Request structure

Field Name

Type

Description

cluster_id

STRING

The cluster to pin. This field is required.

Unpin

Note

You must be a Databricks administrator to invoke this API.

Endpoint

HTTP Method

2.0/clusters/unpin

POST

Allows the cluster to eventually be removed from the list returned by the List API. Unpinning a cluster that is not pinned has no effect.

Example

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/unpin \
--data '{ "cluster_id": "1234-567890-reef123" }'
{}

Request structure

Field Name

Type

Description

cluster_id

STRING

The cluster to unpin. This field is required.

List

Endpoint

HTTP Method

2.0/clusters/list

GET

Return information about all pinned clusters, active clusters, up to 200 of the most recently terminated all-purpose clusters in the past 30 days, and up to 30 of the most recently terminated job clusters in the past 30 days. For example, if there is 1 pinned cluster, 4 active clusters, 45 terminated all-purpose clusters in the past 30 days, and 50 terminated job clusters in the past 30 days, then this API returns the 1 pinned cluster, 4 active clusters, all 45 terminated all-purpose clusters, and the 30 most recently terminated job clusters.

Example

curl --netrc -X GET \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/list \
| jq .
{
  "clusters": [
    {
      "cluster_id": "1234-567890-reef123",
      "spark_context_id": 4020997813441462000,
      "cluster_name": "my-cluster",
      "spark_version": "8.2.x-scala2.12",
      "aws_attributes": {
        "zone_id": "us-west-2c",
        "first_on_demand": 1,
        "availability": "SPOT_WITH_FALLBACK",
        "spot_bid_price_percent": 100,
        "ebs_volume_count": 0
      },
      "node_type_id": "i3.xlarge",
      "driver_node_type_id": "i3.xlarge",
      "autotermination_minutes": 120,
      "enable_elastic_disk": false,
      "disk_spec": {
        "disk_count": 0
      },
      "cluster_source": "UI",
      "enable_local_disk_encryption": false,
      "instance_source": {
        "node_type_id": "i3.xlarge"
      },
      "driver_instance_source": {
        "node_type_id": "i3.xlarge"
      },
      "state": "TERMINATED",
      "state_message": "Inactive cluster terminated (inactive for 120 minutes).",
      "start_time": 1618263108824,
      "terminated_time": 1619746525713,
      "last_state_loss_time": 1619739324740,
      "num_workers": 30,
      "default_tags": {
        "Vendor": "Databricks",
        "Creator": "someone@example.com",
        "ClusterName": "my-cluster",
        "ClusterId": "1234-567890-reef123"
      },
      "creator_user_name": "someone@example.com",
      "termination_reason": {
        "code": "INACTIVITY",
        "parameters": {
          "inactivity_duration_min": "120"
        },
        "type": "SUCCESS"
      },
      "init_scripts_safe_mode": false
    },
    {
      "..."
    }
  ]
}

Response structure

Field Name

Type

Description

clusters

An array of ClusterInfo

A list of clusters.

List node types

Endpoint

HTTP Method

2.0/clusters/list-node-types

GET

Return a list of supported Spark node types. These node types can be used to launch a cluster.

Example

curl --netrc -X GET \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/list-node-types \
| jq .
{
  "node_types": [
    {
      "node_type_id": "r4.xlarge",
      "memory_mb": 31232,
      "num_cores": 4,
      "description": "r4.xlarge",
      "instance_type_id": "r4.xlarge",
      "is_deprecated": false,
      "category": "Memory Optimized",
      "support_ebs_volumes": true,
      "support_cluster_tags": true,
      "num_gpus": 0,
      "node_instance_type": {
        "instance_type_id": "r4.xlarge",
        "local_disks": 0,
        "local_disk_size_gb": 0,
        "instance_family": "EC2 r4 Family vCPUs",
        "swap_size": "10g"
      },
      "is_hidden": false,
      "support_port_forwarding": true,
      "display_order": 0,
      "is_io_cache_enabled": false
    },
    {
      "..."
    }
  ]
}

Response structure

Field Name

Type

Description

node_types

An array of NodeType

The list of available Spark node types.

Runtime versions

Endpoint

HTTP Method

2.0/clusters/spark-versions

GET

Return the list of available runtime versions. These versions can be used to launch a cluster.

Example

curl --netrc -X GET \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/spark-versions \
| jq .
{
  "versions": [
    {
      "key": "8.2.x-scala2.12",
      "name": "8.2 (includes Apache Spark 3.1.1, Scala 2.12)"
    },
    {
      "..."
    }
  ]
}

Response structure

Field Name

Type

Description

versions

An array of SparkVersion

All the available runtime versions.

List zones

Endpoint

HTTP Method

2.0/clusters/list-zones

GET

Return a list of availability zones where clusters can be created in (ex: us-west-2a). These zones can be used to launch a cluster.

Example

curl --netrc -X GET \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/list-zones \
| jq .
{
  "zones": [
    "us-west-2c",
    "us-west-2a",
    "us-west-2b"
  ],
  "default_zone": "us-west-2c"
}

Response structure

Field Name

Type

Description

zones

An array of STRING

The list of available zones (such as [‘us-west-2c’, ‘us-east-2’]).

default_zone

STRING

The availability zone if no zone_id is provided in the cluster creation request.

Events

Endpoint

HTTP Method

2.0/clusters/events

POST

Retrieve a list of events about the activity of a cluster. You can retrieve events from active clusters (running, pending, or reconfiguring) and terminated clusters within 30 days of their last termination. This API is paginated. If there are more events to read, the response includes all the parameters necessary to request the next page of events.

Example:

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/events \
--data @list-events.json \
| jq .

list-events.json:

{
  "cluster_id": "1234-567890-reef123",
  "start_time": 1617238800000,
  "end_time": 1619485200000,
  "order": "DESC",
  "offset": 5,
  "limit": 5,
  "event_types": [ "RUNNING" ]
}
{
  "events": [
    {
      "cluster_id": "1234-567890-reef123",
      "timestamp": 1619471498409,
      "type": "RUNNING",
      "details": {
        "current_num_workers": 2,
        "target_num_workers": 2
      }
    },
    {
      "..."
    }
  ],
  "next_page": {
    "cluster_id": "1234-567890-reef123",
    "start_time": 1617238800000,
    "end_time": 1619485200000,
    "order": "DESC",
    "offset": 10,
    "limit": 5
  },
  "total_count": 25
}

Example request to retrieve the next page of events:

curl --netrc -X POST \
https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/events \
--data @list-events.json \
| jq .

list-events.json:

{
  "cluster_id": "1234-567890-reef123",
  "start_time": 1617238800000,
  "end_time": 1619485200000,
  "order": "DESC",
  "offset": 10,
  "limit": 5,
  "event_types": [ "RUNNING" ]
}
{
  "events": [
    {
      "cluster_id": "1234-567890-reef123",
      "timestamp": 1618330776302,
      "type": "RUNNING",
      "details": {
        "current_num_workers": 2,
        "target_num_workers": 2
      }
    },
    {
      "..."
    }
  ],
  "next_page": {
    "cluster_id": "1234-567890-reef123",
    "start_time": 1617238800000,
    "end_time": 1619485200000,
    "order": "DESC",
    "offset": 15,
    "limit": 5
  },
  "total_count": 25
}

Request structure

Retrieve events pertaining to a specific cluster.

Field Name

Type

Description

cluster_id

STRING

The ID of the cluster to retrieve events about. This field is required.

start_time

INT64

The start time in epoch milliseconds. If empty, returns events starting from the beginning of time.

end_time

INT64

The end time in epoch milliseconds. If empty, returns events up to the current time.

order

ListOrder

The order to list events in; either ASC or DESC. Defaults to DESC.

event_types

An array of ClusterEventType

An optional set of event types to filter on. If empty, all event types are returned.

offset

INT64

The offset in the result set. Defaults to 0 (no offset). When an offset is specified and the results are requested in descending order, the end_time field is required.

limit

INT64

The maximum number of events to include in a page of events. Defaults to 50, and maximum allowed value is 500.

Response structure

Field Name

Type

Description

events

An array of ClusterEvent

This list of matching events.

next_page

Request structure

The parameters required to retrieve the next page of events. Omitted if there are no more events to read.

total_count

INT64

The total number of events filtered by the start_time, end_time, and event_types.

Data structures

AutoScale

Range defining the min and max number of cluster workers.

Field Name

Type

Description

min_workers

INT32

The minimum number of workers to which the cluster can scale down when underutilized. It is also the initial number of workers the cluster will have after creation.

max_workers

INT32

The maximum number of workers to which the cluster can scale up when overloaded. max_workers must be strictly greater than min_workers.

ClusterInfo

Metadata about a cluster.

Field Name

Type

Description

num_workers OR autoscale

INT32 OR AutoScale

If num_workers, number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

Note: When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field will immediately be updated to reflect the target size of 10 workers, whereas the workers listed in executors will gradually increase from 5 to 10 as the new nodes are provisioned.

If autoscale, parameters needed in order to automatically scale clusters up and down based on load.

cluster_id

STRING

Canonical identifier for the cluster. This ID is retained during cluster restarts and resizes, while each new cluster has a globally unique ID.

creator_user_name

STRING

Creator user name. The field won’t be included in the response if the user has already been deleted.

driver

SparkNode

Node on which the Spark driver resides. The driver node contains the Spark master and the Databricks application that manages the per-notebook Spark REPLs.

executors

An array of SparkNode

Nodes on which the Spark executors reside.

spark_context_id

INT64

A canonical SparkContext identifier. This value does change when the Spark driver restarts. The pair (cluster_id, spark_context_id) is a globally unique identifier over all Spark contexts.

jdbc_port

INT32

Port on which Spark JDBC server is listening in the driver node. No service will be listening on on this port in executor nodes.

cluster_name

STRING

Cluster name requested by the user. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

spark_version

STRING

The runtime version of the cluster. You can retrieve a list of available runtime versions by using the Runtime versions API call.

spark_conf

SparkConfPair

An object containing a set of optional, user-specified Spark configuration key-value pairs. You can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.

Example Spark confs: {"spark.speculation": true, "spark.streaming.ui.retainedBatches": 5} or {"spark.driver.extraJavaOptions": "-verbose:gc -XX:+PrintGCDetails"}

aws_attributes

AwsAttributes

Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used.

node_type_id

STRING

This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the List node types API call.

driver_node_type_id

STRING

The node type of the Spark driver. This field is optional. If you don’t specify a value, the driver node type will be set to the same value as node_type_id defined above.

ssh_public_keys

An array of STRING

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.

custom_tags

An array of ClusterTag

An object containing a set of tags. Databricks tags all cluster resources with these tags in addition to default_tags.

Note:

  • Tags are not supported on legacy node types such as compute-optimized and memory-optimized.

  • Databricks allows at most 45 custom tags.

  • If the cluster is created on an instance pool, the cluster tags are not copied to the cluster resources. To tag resources for an instance pool, see the custom_tags field in the Instance Pools API 2.0.

cluster_log_conf

ClusterLogConf

The configuration for delivering Spark logs to a long-term storage destination. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every 5 mins. The destination of driver logs is <destination>/<cluster-ID>/driver, while the destination of executor logs is <destination>/<cluster-ID>/executor.

init_scripts

An array of InitScriptInfo

The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-ID>/init_scripts.

docker_image

DockerImage

Docker image for a custom container.

spark_env_vars

SparkEnvPair

An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pairs of the form (X,Y) are exported as is (that is, export X='Y') while launching the driver and workers.

To specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the following example. This ensures that all default databricks managed environmental variables are included.

Example Spark environment variables: {"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"} or {"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}

autotermination_minutes

INT32

Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination.

enable_elastic_disk

BOOL

Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly - refer to Autoscaling local storage for details.

instance_pool_id

STRING

The optional ID of the instance pool to which the cluster belongs. Refer to Create a pool for details.

cluster_source

ClusterSource

Determines whether the cluster was created by a user through the UI, by the Databricks Jobs scheduler, through an API request, or by the Delta Live Tables runtime. Example values include API, UI, or PIPELINE.

state

ClusterState

State of the cluster.

state_message

STRING

A message associated with the most recent state transition (for example, the reason why the cluster entered a TERMINATED state). This field is unstructured, and its exact format is subject to change.

start_time

INT64

Time (in epoch milliseconds) when the cluster creation request was received (when the cluster entered a PENDING state).

terminated_time

INT64

Time (in epoch milliseconds) when the cluster was terminated, if applicable.

last_state_loss_time

INT64

Time when the cluster driver last lost its state (due to a restart or driver failure).

last_activity_time

INT64

Time (in epoch milliseconds) when the cluster was last active. A cluster is active if there is at least one command that has not finished on the cluster. This field is available after the cluster has reached a RUNNING state. Updates to this field are made as best-effort attempts. Certain versions of Spark do not support reporting of cluster activity. Refer to Automatic termination for details.

cluster_memory_mb

INT64

Total amount of cluster memory, in megabytes.

cluster_cores

FLOAT

Number of CPU cores available for this cluster. This can be fractional since certain node types are configured to share cores between Spark nodes on the same instance.

default_tags

ClusterTag

An object containing a set of tags that are added by Databricks regardless of any custom_tags, including:

  • Vendor: Databricks

  • Creator: <username-of-creator>

  • ClusterName: <name-of-cluster>

  • ClusterId: <id-of-cluster>

  • Name: <Databricks internal use>

    On job clusters:

  • RunName: <name-of-job>

  • JobId: <id-of-job>

    On resources used by Databricks SQL:

  • SqlWarehouseId: <id-of-warehouse>

cluster_log_status

LogSyncStatus

Cluster log delivery status.

termination_reason

TerminationReason

Information about why the cluster was terminated. This field only appears when the cluster is in a TERMINATING or TERMINATED state.

ClusterEvent

Cluster event information.

Field Name

Type

Description

cluster_id

STRING

Canonical identifier for the cluster. This field is required.

timestamp

INT64

The timestamp when the event occurred, stored as the number of milliseconds since the unix epoch. Assigned by the Timeline service.

type

ClusterEventType

The event type. This field is required.

details

EventDetails

The event details. This field is required.

ClusterEventType

Type of a cluster event.

Event Type

Description

CREATING

Indicates that the cluster is being created.

DID_NOT_EXPAND_DISK

Indicates that a disk is low on space, but adding disks would put it over the max capacity.

EXPANDED_DISK

Indicates that a disk was low on space and the disks were expanded.

FAILED_TO_EXPAND_DISK

Indicates that a disk was low on space and disk space could not be expanded.

INIT_SCRIPTS_STARTING

Indicates that the cluster scoped init script has started.

INIT_SCRIPTS_FINISHED

Indicates that the cluster scoped init script has finished.

STARTING

Indicates that the cluster is being started.

RESTARTING

Indicates that the cluster is being started.

TERMINATING

Indicates that the cluster is being terminated.

EDITED

Indicates that the cluster has been edited.

RUNNING

Indicates the cluster has finished being created. Includes the number of nodes in the cluster and a failure reason if some nodes could not be acquired.

RESIZING

Indicates a change in the target size of the cluster (upsize or downsize).

UPSIZE_COMPLETED

Indicates that nodes finished being added to the cluster. Includes the number of nodes in the cluster and a failure reason if some nodes could not be acquired.

NODES_LOST

Indicates that some nodes were lost from the cluster.

DRIVER_HEALTHY

Indicates that the driver is healthy and the cluster is ready for use.

DRIVER_UNAVAILABLE

Indicates that the driver is unavailable.

SPARK_EXCEPTION

Indicates that a Spark exception was thrown from the driver.

DRIVER_NOT_RESPONDING

Indicates that the driver is up but is not responsive, likely due to GC.

DBFS_DOWN

Indicates that the driver is up but DBFS is down.

METASTORE_DOWN

Indicates that the driver is up but the metastore is down.

NODE_BLACKLISTED

Indicates that a node is not allowed by Spark.

PINNED

Indicates that the cluster was pinned.

UNPINNED

Indicates that the cluster was unpinned.

EventDetails

Details about a cluster event.

Field Name

Type

Description

current_num_workers

INT32

The number of nodes in the cluster.

target_num_workers

INT32

The targeted number of nodes in the cluster.

previous_attributes

AwsAttributes

The cluster attributes before a cluster was edited.

attributes

AwsAttributes

  • For created clusters, the attributes of the cluster.

  • For edited clusters, the new attributes of the cluster.

previous_cluster_size

ClusterSize

The size of the cluster before an edit or resize.

cluster_size

ClusterSize

The cluster size that was set in the cluster creation or edit.

cause

ResizeCause

The cause of a change in target size.

reason

TerminationReason

A termination reason:

  • On a TERMINATED event, the reason for the termination.

  • On a RESIZE_COMPLETE event, indicates the reason that we failed to acquire some nodes.

user

STRING

The user that caused the event to occur. (Empty if it was done by Databricks.)

AwsAttributes

Attributes set during cluster creation related to Amazon Web Services.

Field Name

Type

Description

first_on_demand

INT32

The first first_on_demand nodes of the cluster will be placed on on-demand instances. If this value is greater than 0, the cluster driver node will be placed on an on-demand instance. If this value is greater than or equal to the current cluster size, all nodes will be placed on on-demand instances. If this value is less than the current cluster size, first_on_demand nodes will be placed on on-demand instances and the remainder will be placed on availability instances. This value does not affect cluster size and cannot be mutated over the lifetime of a cluster.

availability

AwsAvailability

Availability type used for all subsequent nodes past the first_on_demand ones. Note: If first_on_demand is zero, this availability type will be used for the entire cluster.

zone_id

STRING

Identifier for the availability zone (AZ) in which the cluster resides. By default, the setting has a value of auto, otherwise known as Auto-AZ. With Auto-AZ, Databricks selects the AZ based on available IPs in the workspace subnets and retries in other availability zones if AWS returns insufficient capacity errors.

If you want, you can also specify an availability zone to use. This benefits accounts that have reserved instances in a specific AZ. Specify the AZ as a string (for example, "us-west-2a"). The provided availability zone must be in the same region as the Databricks deployment. For example, “us-west-2a” is not a valid zone ID if the Databricks deployment resides in the “us-east-1” region.

The list of available zones as well as the default value can be found by using the List zones API.

instance_profile_arn

STRING

Nodes for this cluster will only be placed on AWS instances with this instance profile. If omitted, nodes will be placed on instances without an instance profile. The instance profile must have previously been added to the Databricks environment by an account administrator.

This feature may only be available to certain customer plans.

spot_bid_price_percent

INT32

The max price for AWS spot instances, as a percentage of the corresponding instance type’s on-demand price. For example, if this field is set to 50, and the cluster needs a new i3.xlarge spot instance, then the max price is half of the price of on-demand i3.xlarge instances. Similarly, if this field is set to 200, the max price is twice the price of on-demand i3.xlarge instances. If not specified, the default value is 100. When spot instances are requested for this cluster, only spot instances whose max price percentage matches this field will be considered. For safety, we enforce this field to be no more than 10000.

ebs_volume_type

EbsVolumeType

The type of EBS volumes that will be launched with this cluster.

ebs_volume_count

INT32

The number of volumes launched for each instance. You can choose up to 10 volumes. This feature is only enabled for supported node types. Legacy node types cannot specify custom EBS volumes. For node types with no instance store, at least one EBS volume needs to be specified; otherwise, cluster creation will fail.

These EBS volumes will be mounted at /ebs0, /ebs1, and etc. Instance store volumes will be mounted at /local_disk0, /local_disk1, and etc.

If EBS volumes are attached, Databricks will configure Spark to use only the EBS volumes for scratch storage because heterogeneously sized scratch devices can lead to inefficient disk utilization. If no EBS volumes are attached, Databricks will configure Spark to use instance store volumes.

If EBS volumes are specified, then the Spark configuration spark.local.dir will be overridden.

ebs_volume_size

INT32

The size of each EBS volume (in GiB) launched for each instance. For general purpose SSD, this value must be within the range 100 - 4096. For throughput optimized HDD, this value must be within the range 500 - 4096. Custom EBS volumes cannot be specified for the legacy node types (memory-optimized and compute-optimized).

ebs_volume_iops

INT32

The number of IOPS per EBS gp3 volume.

This value must be between 3000 and 16000.

The value of IOPS and throughput is calculated based on AWS documentation to match the maximum performance of a gp2 volume with the same volume size.

For more information, see the EBS volume limit calculator.

ebs_volume_throughput

INT32

The throughput per EBS gp3 volume, in MiB per second.

This value must be between 125 and 1000.

If neither ebs_volume_iops nor ebs_volume_throughput is specified, the values are inferred from the disk size:

Disk size

IOPS

Throughput

Greater than 1000

3 times the disk size, up to 16000

250

Between 170 and 1000

3000

250

Below 170

3000

125

ClusterAttributes

Common set of attributes set during cluster creation. These attributes cannot be changed over the lifetime of a cluster.

Field Name

Type

Description

cluster_name

STRING

Cluster name requested by the user. This doesn’t have to be unique. If not specified at creation, the cluster name will be an empty string.

spark_version

STRING

The runtime version of the cluster, for example “5.0.x-scala2.11”. You can retrieve a list of available runtime versions by using the Runtime versions API call.

spark_conf

SparkConfPair

An object containing a set of optional, user-specified Spark configuration key-value pairs. You can also pass in a string of extra JVM options to the driver and the executors via spark.driver.extraJavaOptions and spark.executor.extraJavaOptions respectively.

Example Spark confs: {"spark.speculation": true, "spark.streaming.ui.retainedBatches": 5} or {"spark.driver.extraJavaOptions": "-verbose:gc -XX:+PrintGCDetails"}

aws_attributes

AwsAttributes

Attributes related to clusters running on Amazon Web Services. If not specified at cluster creation, a set of default values will be used.

node_type_id

STRING

This field encodes, through a single value, the resources available to each of the Spark nodes in this cluster. For example, the Spark nodes can be provisioned and optimized for memory or compute intensive workloads. A list of available node types can be retrieved by using the List node types API call.

driver_node_type_id

STRING

The node type of the Spark driver. This field is optional. If you don’t specify a value, the driver node type will be set to the same value as node_type_id defined above.

ssh_public_keys

An array of STRING

SSH public key contents that will be added to each Spark node in this cluster. The corresponding private keys can be used to login with the user name ubuntu on port 2200. Up to 10 keys can be specified.

custom_tags

ClusterTag

An object containing a set of tags for cluster resources. Databricks tags all cluster resources with these tags in addition to default_tags.

Note:

  • Tags are not supported on legacy node types such as compute-optimized and memory-optimized.

  • Databricks allows at most 45 custom tags.

  • If the cluster is created on an instance pool, the cluster tags are not copied to the cluster resources. To tag resources for an instance pool, see the custom_tags field in the Instance Pools API 2.0.

cluster_log_conf

ClusterLogConf

The configuration for delivering Spark logs to a long-term storage destination. Only one destination can be specified for one cluster. If the conf is given, the logs will be delivered to the destination every 5 mins. The destination of driver logs is <destination>/<cluster-ID>/driver, while the destination of executor logs is <destination>/<cluster-ID>/executor.

init_scripts

An array of InitScriptInfo

The configuration for storing init scripts. Any number of destinations can be specified. The scripts are executed sequentially in the order provided. If cluster_log_conf is specified, init script logs are sent to <destination>/<cluster-ID>/init_scripts.

docker_image

DockerImage

Docker image for a custom container.

spark_env_vars

SparkEnvPair

An object containing a set of optional, user-specified environment variable key-value pairs. Key-value pairs of the form (X,Y) are exported as is (that is, export X='Y') while launching the driver and workers.

In order to specify an additional set of SPARK_DAEMON_JAVA_OPTS, we recommend appending them to $SPARK_DAEMON_JAVA_OPTS as shown in the following example. This ensures that all default databricks managed environmental variables are included.

Example Spark environment variables: {"SPARK_WORKER_MEMORY": "28000m", "SPARK_LOCAL_DIRS": "/local_disk0"} or {"SPARK_DAEMON_JAVA_OPTS": "$SPARK_DAEMON_JAVA_OPTS -Dspark.shuffle.service.enabled=true"}

autotermination_minutes

INT32

Automatically terminates the cluster after it is inactive for this time in minutes. If not set, this cluster will not be automatically terminated. If specified, the threshold must be between 10 and 10000 minutes. You can also set this value to 0 to explicitly disable automatic termination.

enable_elastic_disk

BOOL

Autoscaling Local Storage: when enabled, this cluster will dynamically acquire additional disk space when its Spark workers are running low on disk space. This feature requires specific AWS permissions to function correctly. Refer to Autoscaling local storage for details.

instance_pool_id

STRING

The optional ID of the instance pool to which the cluster belongs. Refer to Create a pool for details.

cluster_source

ClusterSource

Determines whether the cluster was created by a user through the UI, by the Databricks Jobs scheduler, through an API request, or by the Delta Live Tables runtime. Example values include API, UI, or PIPELINE.

policy_id

STRING

A cluster policy ID.

cluster_mount_infos

An array of MountInfo

An object containing optional specifications for a network file system mount.

ClusterSize

Cluster size specification.

Field Name

Type

Description

num_workers OR autoscale

INT32 OR AutoScale

If num_workers, number of worker nodes that this cluster should have. A cluster has one Spark driver and num_workers executors for a total of num_workers + 1 Spark nodes.

When reading the properties of a cluster, this field reflects the desired number of workers rather than the actual number of workers. For instance, if a cluster is resized from 5 to 10 workers, this field is updated to reflect the target size of 10 workers, whereas the workers listed in executors gradually increase from 5 to 10 as the new nodes are provisioned.

If autoscale, parameters needed in order to automatically scale clusters up and down based on load.

ListOrder

Generic ordering enum for list-based queries.

Order

Description

DESC

Descending order.

ASC

Ascending order.

ResizeCause

Reason why a cluster was resized.

Cause

Description

AUTOSCALE

Automatically resized based on load.

USER_REQUEST

User requested a new size.

AUTORECOVERY

Autorecovery monitor resized the cluster after it lost a node.

ClusterLogConf

Path to cluster log.

Field Name

Type

Description

dbfs OR s3

DbfsStorageInfo

S3StorageInfo

DBFS location of cluster log. Destination must be provided. For example, { "dbfs" : { "destination" : "dbfs:/home/cluster_log" } }

S3 location of cluster log. destination and either region or warehouse must be provided. For example, { "s3": { "destination" : "s3://cluster_log_bucket/prefix", "region" : "us-west-2" } }

InitScriptInfo

Path to an init script. For instructions on using init scripts with Databricks Container Services, see Use an init script.

Note

The file storage type (field name: file) is only available for clusters set up using Databricks Container Services. See FileStorageInfo.

Field Name

Type

Description

workspace OR dbfs (deprecated)

OR S3

WorkspaceStorageInfo

DbfsStorageInfo (deprecated)

S3StorageInfo

Workspace location of init script. Destination must be provided. For example, { "workspace" : { "destination" : "/Users/someone@domain.com/init_script.sh" } }

(Deprecated) DBFS location of init script. Destination must be provided. For example, { "dbfs" : { "destination" : "dbfs:/home/init_script" } }

S3 location of init script. Destination and either region or warehouse must be provided. For example, { "s3": { "destination" : "s3://init_script_bucket/prefix", "region" : "us-west-2" } }

ClusterTag

Cluster tag definition.

Type

Description

STRING

The key of the tag. The key length must be between 1 and 127 UTF-8 characters, inclusive. For a list of all restrictions, see AWS Tag Restrictions: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html#tag-restrictions

STRING

The value of the tag. The value length must be less than or equal to 255 UTF-8 characters. For a list of all restrictions, see AWS Tag Restrictions: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html#tag-restrictions

DbfsStorageInfo

DBFS storage information.

Field Name

Type

Description

destination

STRING

DBFS destination. Example: dbfs:/my/path

FileStorageInfo

File storage information.

Note

This location type is only available for clusters set up using Databricks Container Services.

Field Name

Type

Description

destination

STRING

File destination. Example: file:/my/file.sh

WorkspaceStorageInfo

Workspace storage information.

Field Name

Type

Description

destination

STRING

File destination. Example: /Users/someone@domain.com/init_script.sh

DockerImage

Docker image connection information.

Field

Type

Description

url

string

URL for the Docker image.

basic_auth

DockerBasicAuth

Basic authentication information for Docker repository.

DockerBasicAuth

Docker repository basic authentication information.

Field

Description

username

User name for the Docker repository.

password

Password for the Docker repository.

LogSyncStatus

Log delivery status.

Field Name

Type

Description

last_attempted

INT64

The timestamp of last attempt. If the last attempt fails, last_exception contains the exception in the last attempt.

last_exception

STRING

The exception thrown in the last attempt, it would be null (omitted in the response) if there is no exception in last attempted.

NodeType

Description of a Spark node type including both the dimensions of the node and the instance type on which it will be hosted.

Field Name

Type

Description

node_type_id

STRING

Unique identifier for this node type. This field is required.

memory_mb

INT32

Memory (in MB) available for this node type. This field is required.

num_cores

FLOAT

Number of CPU cores available for this node type. This can be fractional if the number of cores on a machine instance is not divisible by the number of Spark nodes on that machine. This field is required.

description

STRING

A string description associated with this node type. This field is required.

instance_type_id

STRING

An identifier for the type of hardware that this node runs on. This field is required.

is_deprecated

BOOL

Whether the node type is deprecated. Non-deprecated node types offer greater performance.

node_info

ClusterCloudProviderNodeInfo

Node type info reported by the cloud provider.

ClusterCloudProviderNodeInfo

Information about an instance supplied by a cloud provider.

Field Name

Type

Description

status

ClusterCloudProviderNodeStatus

Status as reported by the cloud provider.

available_core_quota

INT32

Available CPU core quota.

total_core_quota

INT32

Total CPU core quota.

ClusterCloudProviderNodeStatus

Status of an instance supplied by a cloud provider.

Status

Description

NotEnabledOnSubscription

Node type not available for subscription.

NotAvailableInRegion

Node type not available in region.

ParameterPair

Parameter that provides additional information about why a cluster was terminated.

Type

Description

TerminationParameter

Type of termination information.

STRING

The termination information.

SparkConfPair

Spark configuration key-value pairs.

Type

Description

STRING

A configuration property name.

STRING

The configuration property value.

SparkEnvPair

Spark environment variable key-value pairs.

Important

When specifying environment variables in a job cluster, the fields in this data structure accept only Latin characters (ASCII character set). Using non-ASCII characters will return an error. Examples of invalid, non-ASCII characters are Chinese, Japanese kanjis, and emojis.

Type

Description

STRING

An environment variable name.

STRING

The environment variable value.

SparkNode

Spark driver or executor configuration.

Field Name

Type

Description

private_ip

STRING

Private IP address (typically a 10.x.x.x address) of the Spark node. This is different from the private IP address of the host instance.

public_dns

STRING

Public DNS address of this node. This address can be used to access the Spark JDBC server on the driver node. To communicate with the JDBC server, traffic must be manually authorized by adding security group rules to the “worker-unmanaged” security group via the AWS console.

node_id

STRING

Globally unique identifier for this node.

instance_id

STRING

Globally unique identifier for the host instance from the cloud provider.

start_timestamp

INT64

The timestamp (in millisecond) when the Spark node is launched.

node_aws_attributes

SparkNodeAwsAttributes

Attributes specific to AWS for a Spark node.

host_private_ip

STRING

The private IP address of the host instance.

SparkVersion

Databricks Runtime version of the cluster.

Field Name

Type

Description

key

STRING

Databricks Runtime version key, for example 7.3.x-scala2.12. The value that should be provided as the spark_version when creating a new cluster. The exact runtime version may change over time for a “wildcard” version (that is, 7.3.x-scala2.12 is a “wildcard” version) with minor bug fixes.

name

STRING

A descriptive name for the runtime version, for example “Databricks Runtime 7.3 LTS”.

TerminationReason

Reason why a cluster was terminated.

Field Name

Type

Description

code

TerminationCode

Status code indicating why a cluster was terminated.

type

TerminationType

Reason indicating why a cluster was terminated.

parameters

ParameterPair

Object containing a set of parameters that provide information about why a cluster was terminated.

PoolClusterTerminationCode

Status code indicating why the cluster was terminated due to a pool failure.

Code

Description

INSTANCE_POOL_MAX_CAPACITY_FAILURE

The pool max capacity has been reached.

INSTANCE_POOL_NOT_FOUND_FAILURE

The pool specified by the cluster is no longer active or doesn’t exist.

ClusterSource

Service that created the cluster.

Service

Description

UI

Cluster created through the UI.

JOB

Cluster created by the Databricks job scheduler.

API

Cluster created through an API call.

ClusterState

State of a cluster. The allowable state transitions are as follows:

  • PENDING -> RUNNING

  • PENDING -> TERMINATING

  • RUNNING -> RESIZING

  • RUNNING -> RESTARTING

  • RUNNING -> TERMINATING

  • RESTARTING -> RUNNING

  • RESTARTING -> TERMINATING

  • RESIZING -> RUNNING

  • RESIZING -> TERMINATING

  • TERMINATING -> TERMINATED

State

Description

PENDING

Indicates that a cluster is in the process of being created.

RUNNING

Indicates that a cluster has been started and is ready for use.

RESTARTING

Indicates that a cluster is in the process of restarting.

RESIZING

Indicates that a cluster is in the process of adding or removing nodes.

TERMINATING

Indicates that a cluster is in the process of being destroyed.

TERMINATED

Indicates that a cluster has been successfully destroyed.

ERROR

This state is no longer used. It was used to indicate a cluster that failed to be created. TERMINATING and TERMINATED are used instead.

UNKNOWN

Indicates that a cluster is in an unknown state. A cluster should never be in this state.

TerminationCode

Status code indicating why the cluster was terminated.

Code

Description

USER_REQUEST

A user terminated the cluster directly. Parameters should include a username field that indicates the specific user who terminated the cluster.

JOB_FINISHED

The cluster was launched by a job and terminated when the job completed.

INACTIVITY

The cluster was terminated since it was idle.

CLOUD_PROVIDER_SHUTDOWN

The instance that hosted the Spark driver was terminated by the cloud provider. In AWS, for example, AWS may retire instances and directly shut them down. Parameters should include an aws_instance_state_reason field indicating the AWS-provided reason why the instance was terminated.

COMMUNICATION_LOST

Databricks lost connection to services on the driver instance. For example, this can happen when problems arise in cloud networking infrastructure, or when the instance itself becomes unhealthy.

CLOUD_PROVIDER_LAUNCH_FAILURE

Databricks experienced a cloud provider failure when requesting instances to launch clusters. For example, AWS limits the number of running instances and EBS volumes. If you ask Databricks to launch a cluster that requires instances or EBS volumes that exceed your AWS limit, the cluster will fail with this status code. Parameters should include one of aws_api_error_code, aws_instance_state_reason, or aws_spot_request_status to indicate the AWS-provided reason why Databricks could not request the required instances for the cluster.

SPARK_STARTUP_FAILURE

The cluster failed to initialize. Possible reasons may include failure to create the environment for Spark or issues launching the Spark master and worker processes.

INVALID_ARGUMENT

Cannot launch the cluster because the user specified an invalid argument. For example, the user might specify an invalid runtime version for the cluster.

UNEXPECTED_LAUNCH_FAILURE

While launching this cluster, Databricks failed to complete critical setup steps, terminating the cluster.

INTERNAL_ERROR

Databricks encountered an unexpected error that forced the running cluster to be terminated. Contact Databricks support for additional details.

SPARK_ERROR

The Spark driver failed to start. Possible reasons may include incompatible libraries and initialization scripts that corrupted the Spark container.

METASTORE_COMPONENT_UNHEALTHY

The cluster failed to start because the external metastore could not be reached. Refer to Troubleshooting.

DBFS_COMPONENT_UNHEALTHY

The cluster failed to start because Databricks File System (DBFS) could not be reached.

DRIVER_UNREACHABLE

Databricks was not able to access the Spark driver, because it was not reachable.

DRIVER_UNRESPONSIVE

Databricks was not able to access the Spark driver, because it was unresponsive.

INSTANCE_UNREACHABLE

Databricks was not able to access instances in order to start the cluster. This can be a transient networking issue. If the problem persists, this usually indicates a networking environment misconfiguration.

CONTAINER_LAUNCH_FAILURE

Databricks was unable to launch containers on worker nodes for the cluster. Have your admin check your network configuration.

INSTANCE_POOL_CLUSTER_FAILURE

Pool backed cluster specific failure. Refer to Create a pool for details.

REQUEST_REJECTED

Databricks cannot handle the request at this moment. Try again later and contact Databricks if the problem persists.

INIT_SCRIPT_FAILURE

Databricks cannot load and run a cluster-scoped init script on one of the cluster’s nodes, or the init script terminates with a non-zero exit code. Refer to Init script logs.

TRIAL_EXPIRED

The Databricks trial subscription expired.

TerminationType

Reason why the cluster was terminated.

Type

Description

SUCCESS

Termination succeeded.

CLIENT_ERROR

Non-retriable. Client must fix parameters before reattempting the cluster creation.

SERVICE_FAULT

Databricks service issue. Client can retry.

CLOUD_FAILURE

Cloud provider infrastructure issue. Client can retry after the underlying issue is resolved.

TerminationParameter

Key that provides additional information about why a cluster was terminated.

Key

Description

username

The username of the user who terminated the cluster.

aws_api_error_code

The AWS provided error code describing why cluster nodes could not be provisioned. For example, InstanceLimitExceeded indicates that the limit of EC2 instances for a specific instance type has been exceeded. For reference, see: https://docs.aws.amazon.com/AWSEC2/latest/APIReference/query-api-troubleshooting.html.

aws_instance_state_reason

The AWS provided state reason describing why the driver node was terminated. For example, Client.VolumeLimitExceeded indicates that the limit of EBS volumes or total EBS volume storage has been exceeded. For reference, see https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_StateReason.html.

aws_spot_request_status

Describes why a spot request could not be fulfilled. For example, price-too-low indicates that the max price was lower than the current spot price. For reference, see: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-bid-status.html#spot-instance-bid-status-understand.

aws_spot_request_fault_code

Provides additional details when a spot request fails. For example InsufficientFreeAddressesInSubnet indicates the subnet does not have free IP addresses to accommodate the new instance. For reference, see https://docs.aws.amazon.com/cli/latest/reference/ec2/describe-spot-instance-requests.html.

aws_impaired_status_details

The AWS provided status check which failed and induced a node loss. This status may correspond to a failed instance or system check. For reference, see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-system-instance-status-check.html.

aws_instance_status_event

The AWS provided scheduled event (for example reboot) which induced a node loss. For reference, see https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-instances-status-check_sched.html.

aws_error_message

Human-readable context of various failures from AWS. This field is unstructured, and its exact format is subject to change.

databricks_error_message

Additional context that may explain the reason for cluster termination. This field is unstructured, and its exact format is subject to change.

inactivity_duration_min

An idle cluster was shut down after being inactive for this duration.

instance_id

The ID of the instance that was hosting the Spark driver.

instance_pool_id

The ID of the instance pool the cluster is using.

instance_pool_error_code

The error code for cluster failures specific to a pool.

S3StorageInfo

S3 storage information.

Field Name

Type

Description

destination

STRING

S3 destination. For example: s3://my-bucket/some-prefix You must configure the cluster with an instance profile and the instance profile must have write access to the destination. You cannot use AWS keys.

region

STRING

S3 region. For example: us-west-2. Either region or warehouse must be set. If both are set, warehouse is used.

warehouse

STRING

S3 warehouse. For example: https://s3-us-west-2.amazonaws.com. Either region or warehouse must be set. If both are set, warehouse is used.

enable_encryption

BOOL

(Optional)Enable server side encryption, false by default.

encryption_type

STRING

(Optional) The encryption type, it could be sse-s3 or sse-kms. It is used only when encryption is enabled and the default type is sse-s3.

kms_key

STRING

(Optional) KMS key used if encryption is enabled and encryption type is set to sse-kms.

canned_acl

STRING

(Optional) Set canned access control list. For example: bucket-owner-full-control. If canned_acl is set, the cluster instance profile must have s3:PutObjectAcl permission on the destination bucket and prefix. The full list of possible canned ACLs can be found at https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl. By default only the object owner gets full control. If you are using cross account role for writing data, you may want to set bucket-owner-full-control to make bucket owner able to read the logs.

SparkNodeAwsAttributes

Attributes specific to AWS for a Spark node.

Field Name

Type

Description

is_spot

BOOL

Whether this node is on an Amazon spot instance.

AwsAvailability

The set of AWS availability types supported when setting up nodes for a cluster.

Type

Description

SPOT

Use spot instances.

ON_DEMAND

Use on-demand instances.

SPOT_WITH_FALLBACK

Preferably use spot instances, but fall back to on-demand instances if spot instances cannot be acquired (for example, if AWS spot prices are too high).

EbsVolumeType

Databricks supports gp2 and gp3 EBS volume types. Follow the instructions at Manage SSD storage to select gp2 or gp3 for your workspace.

Type

Description

GENERAL_PURPOSE_SSD

Provision extra storage using AWS EBS volumes.

THROUGHPUT_OPTIMIZED_HDD

Provision extra storage using AWS st1 volumes.

MountInfo

Configuration to mount a network file system

Field Name

Type

Description

network_filesystem_info

NetworkFileSystemInfo

Object defining parameters for the network file system.

remote_mount_dir_path

STRING

The location of a directory in the network file system to mount.

local_mount_dir_path

STRING

The mount point in the Spark container.

NetworkFileSystemInfo

Network file system parameters

Field Name

Type

Description

server_address

STRING

DNS name of the network file system server.

mount_options

STRING

A comma-separated list of options to pass to the mount command. This field is optional.