SSH Access to Clusters

SSH allows you to log into Spark clusters remotely for advanced troubleshooting and installing custom software.

This topic describes how to configure your AWS account to enable ingress access to your cluster with your public key, and how to open an SSH connection to cluster nodes.

Configure security group

You must update the Databricks security group in your AWS account to give ingress access to the IP address from which you will initiate the SSH connection. You can set this for a single IP address or provide a range that represents your entire office IP range.

  1. In your AWS console, find the Databricks security group. It will have a label similar to <databricks-instance>-worker-unmanaged. (Example: dbc-fb3asdddd3-worker-unmanaged)

  2. Edit the security group and add an inbound TCP rule to allow port 2200 to worker machines. It can be a single IP address or a range.

    SSH Security Group
  3. Make sure that your computer and office allow you to send TCP traffic on port 2200.

Generate SSH key pair

Create an SSH key pair by running this command in a terminal session:

ssh-keygen -t rsa -b 4096 -C "email@example.com"

You must provide the path to the directory where you want to save the public and private key. The public key is saved with the extension .pub.

Configure a new cluster with your public key

  1. Copy the entire contents of the public key file.

  2. In the cluster configuration page, click the Advanced Options toggle.

  3. At the bottom of the page, click the SSH tab.

  4. Paste the key you copied into the SSH Public Key field.

    SSH Input

Configure an existing cluster with your public key

If you have a cluster and didn’t provide the public key during cluster creation, you can inject the public key by running this code from any notebook attached to the cluster:

val publicKey = " put your public key here "

def addAuthorizedPublicKey(key: String): Unit = {
  val fw = new java.io.FileWriter("/home/ubuntu/.ssh/authorized_keys", /* append */ true)
  fw.write("\n" + key)
  fw.close()
}

val numExecutors = sc.getExecutorMemoryStatus.keys.size
sc.parallelize(0 until numExecutors, numExecutors).foreach { i =>
  addAuthorizedPublicKey(publicKey)
}
addAuthorizedPublicKey(publicKey)

SSH into the Spark driver

  1. In the cluster configuration page, click the Advanced Options toggle.

  2. Click the SSH tab. Note the driver hostname.

  3. Run the following command, replacing the hostname and private key file path.

    ssh ubuntu@<hostname> -p 2200 -i <private-key-file-path>
    

SSH into Spark workers

You SSH into workers the same way that you SSH into the driver.

  1. In the cluster configuration page, click the Spark Cluster UI - Master tab.

  2. In the Workers table, click the worker that you want to SSH into. Copy the Hostname field.

    ../../_images/hostname.png