Terminate a Cluster

To save cluster resources, you can terminate a cluster. A terminated cluster cannot run notebooks or jobs, but its configuration is stored so that it can be reused at a later time.

Databricks retains the configuration information for up to 70 interactive clusters terminated in the last 30 days and up to 30 job clusters recently terminated by the job scheduler.

To keep an interactive cluster configuration even after it has been terminated for more than 30 days, an administrator can pin a cluster to the cluster list.

You can manually terminate a cluster or configure the cluster to automatically terminate after a specified period of inactivity.

Manual termination

You can manually terminate a cluster from the

  • Cluster list

    ../../_images/terminate-list.png
  • Cluster detail page

    ../../_images/terminate-details.png

Automatic termination

You can also set auto termination for a cluster. During cluster creation, you can specify an inactivity period in minutes after which you want the cluster to terminate. If the difference between the current time and the last command run on the cluster is more than the inactivity period specified, Databricks automatically terminates that cluster.

A cluster is considered inactive when all commands on the cluster, including Spark jobs, Structured Streaming, and JDBC calls, have finished executing. This does not include commands run by SSH-ing into the cluster and running bash commands.

Warning

Clusters do not report activity resulting from the use of DStreams. This means that an autoterminating cluster may be terminated while it is running DStreams. Turn off autotermination for clusters running DStreams or consider using Structured Streaming.

Configuration

You configure automatic termination in the Auto Termination field on the cluster create dialog:

../../_images/auto-terminate.png

The default value of the auto terminate setting depends on whether you choose to create a standard or high concurrency cluster:

  • Standard clusters are configured to automatically terminate after 120 minutes.
  • High concurrency clusters are configured to not terminate automatically.

You can opt out of auto termination by clearing the Auto Termination checkbox or by specifying an inactivity period of 0.

Note

Auto termination is best supported in the latest Spark versions. Older Spark versions have known limitations which may result in inaccurate reporting of cluster activity. For example, clusters running JDBC, R, or streaming commands may report a stale activity time which will lead to premature cluster termination. You are strongly recommended to upgrade to the most recent Spark version to benefit from bug fixes and improvements to auto termination.

Termination reason

Databricks records information whenever a cluster is terminated.

../../_images/termination-reason.png

This page lists common termination reasons and describes potential steps for remediation.

Cloud provider limit

Databricks launches a cluster by requesting resources on behalf of your cloud account. Sometimes, these requests fail because they would exceed your cloud account’s resource limits. In AWS, common error codes include:

InstanceLimitExceeded

AWS limits the number of running instances for each node type. Possible solutions include:

  • Requesting a cluster with fewer nodes.
  • Requesting a cluster with a different node type.
  • Ask AWS support to increase instance limits.
Client.VolumeLimitExceeded

The cluster creation request exceeded the EBS volume limit. AWS has two types of volume limits: a limit on the total number of EBS volumes, and a limit on the total storage size of EBS volumes. Potential remediation steps:

  • Requesting a cluster with fewer nodes.
  • Check which of the two limits was exceeded. (AWS trusted advisor shows service limits for free). If the request exceeded the total number of EBS volumes, try reducing the requested number of volumes per node. If the request exceeded the total EBS storage size, try reducing the requested storage size and/or the number of EBS volumes.
  • Ask AWS support to increase EBS volume limits.
RequestLimitExceeded
AWS limits the rate of API requests made for an AWS account. Please wait a while before retrying the request.

Cloud provider shutdown

The Spark driver is a single point of failure because it holds all cluster state. If the instance hosting the driver node is shut down, Databricks must terminate the cluster. In AWS, common error codes include:

Client.UserInitiatedShutdown
Instance was terminated by a direct request to AWS which did not originate from Databricks. Please contact your AWS administrator for more details.
Server.InsufficientInstanceCapacity
AWS could not satisfy the instance request. Please wait a while and retry the request. Contact AWS support if the problem persists.
Server.SpotInstanceTermination
Instance was terminated by AWS because the current spot price has exceeded the maximum bid made for this instance. To avoid this issue, use an on-demand instance for the driver, choose a different availability zone, or specify a higher spot bid price.

For other shutdown-related error codes, refer to AWS docs.

Cloud provider launch failures

In AWS, common error codes include:

UnauthorizedOperation

Databricks was not authorized to launch the requested instances. Possible reasons include:

  • Your AWS administrator invalidated Databricks’ AWS access key or IAM role used to launch instances.
  • You are trying to launch a cluster using an IAM role that Databricks does not have permission to use. Please contact your AWS administrator who set up the IAM role. For more information, see Secure Access to S3 Buckets Using IAM Roles.
Unsupported with message “EBS-optimized instances are not supported for your requested configuration”
This error indicates that the selected instance type is not available in the selected availability zone (AZ). It does not actually have anything to do with EBS-optimization being enabled. To remediate, you can choose a different instance type or AZ.
AuthFailure.ServiceLinkedRoleCreationNotPermitted - The provided credentials do not have permission to create the service-linked role for EC2 Spot Instances
This error indicates that the Databricks administrator needs to update the credentials used to launch instances in your account. Instructions and the updated policy can be found AWS Account.

See Error Codes for a complete list of AWS error codes.

Communication lost

This means that Databricks was able to launch the cluster, but lost the connection to the instance hosting the Spark driver.

This might be caused by an incorrect networking configuration (for example, changing security group settings for Databricks workers), or it might be a transient AWS networking issue.