Encrypt traffic between cluster worker nodes

Preview

This feature is in Public Preview.

Note

This feature requires the Enterprise plan. Contact your Databricks account representative for more information.

User queries and transformations are typically sent to your clusters over an encrypted channel. By default, however, the data exchanged between worker nodes in a cluster is not encrypted. If your environment requires that data be encrypted at all times, whether at rest or in transit, you can create an init script that configures your clusters to encrypt traffic between worker nodes, using AES 128-bit encryption over a TLS 1.2 connection.

Note

Although AES enables cryptographic routines to take advantage of hardware acceleration, there’s a performance penalty compared to unencrypted traffic. This penalty can result in queries taking longer on an encrypted cluster, depending on the amount of data shuffled between nodes.

Enabling encryption of traffic between worker nodes requires setting Spark configuration parameters through an init script. You can use a cluster-scoped init script for a single cluster or a global init script if you want all clusters in your workspace to use worker-to-worker encryption.

One time, copy the keystore file to a directory in DBFS. Then create the init script that applies the encryption settings.

The init script must perform the following tasks:

  1. Get the JKS keystore file and password.
  2. Set the Spark executor configuration.
  3. Set the Spark driver configuration.

Note

The JKS keystore file used for enabling SSL/HTTPS is dynamically generated for each workspace. The JKS keystore file’s password is hardcoded and not intended to protect the confidentiality of the keystore.

The following is an example init script that implements these three tasks to generate the cluster encryption configuration.

#!/bin/bash

keystore_file="/dbfs/<keystore_directory>/jetty_ssl_driver_keystore.jks"
keystore_password="gb1gQqZ9ZIHS"
sasl_secret=$(sha256sum $keystore_file | cut -d' ' -f1)

if [[ $DB_IS_DRIVER = "TRUE" ]]; then
  driver_conf=${DB_HOME}/driver/conf/spark-branch.conf
  echo "Configuring driver conf at $driver_conf"
  if [ ! -e $driver_conf ] ; then
    touch $driver_conf
  fi

  head -n 1 ${DB_HOME}/driver/conf/spark-branch.conf >> $driver_conf

  echo "  // Authenticate">> $driver_conf
  echo "  \\"spark.authenticate\\" = true" >> $driver_conf
  echo "  \\"spark.authenticate.secret\\" = \\"$sasl_secret\\"" >> $driver_conf

  echo "  // Configure AES encryption">> $driver_conf
  echo "  \\"spark.network.crypto.enabled\\" = true" >> $driver_conf
  echo "  \\"spark.network.crypto.saslFallback\\" = false" >> $driver_conf

  echo "  // Configure SSL">> $driver_conf
  echo "  \\"spark.ssl.enabled\\" = true" >> $driver_conf
  echo "  \\"spark.ssl.keyPassword\\" = \\"$keystore_password\\"" >> $driver_conf
  echo "  \\"spark.ssl.keyStore\\" = \\"$keystore_file\\"" >> $driver_conf
  echo "  \\"spark.ssl.keyStorePassword\\" = \\"$keystore_password\\"" >> $driver_conf
  echo "  \\"spark.ssl.protocol\\" = \\"TLSv1.2\\"" >> $driver_conf
  echo "  \\"spark.ssl.standalone.enabled\\" = true" >> $driver_conf
  echo "  \\"spark.ssl.ui.enabled\\" = true" >> $driver_conf
  echo " }"  >> $driver_conf
  echo "Successfully configured driver conf at $driver_conf"
fi

spark_defaults_conf="$DB_HOME/spark/conf/spark-defaults.conf"
echo "Configuring spark defaults conf at $spark_default_conf"
if [ ! -e $spark_defaults_conf ] ; then
  touch $spark_defaults_conf
fi
echo "spark.authenticate true" >> $spark_defaults_conf
echo "spark.authenticate.secret $sasl_secret" >> $spark_defaults_conf

echo "spark.network.crypto.enabled true" >> $spark_defaults_conf
echo "spark.network.crypto.saslFallback false" >> $spark_defaults_conf

echo "spark.ssl.enabled true" >> $spark_defaults_conf
echo "spark.ssl.keyPassword $keystore_password" >> $spark_defaults_conf
echo "spark.ssl.keyStore $keystore_file" >> $spark_defaults_conf
echo "spark.ssl.keyStorePassword $keystore_password" >> $spark_defaults_conf
echo "spark.ssl.protocol TLSv1.2" >> $spark_defaults_conf
echo "spark.ssl.standalone.enabled true" >> $spark_defaults_conf
echo "spark.ssl.ui.enabled true" >> $spark_defaults_conf
echo "Successfully configured spark defaults conf at $spark_default_conf"

Once the initialization of the driver and worker nodes is complete, all traffic between these nodes is encrypted using the keystore file.

This following notebook copies the keystore file and generates the init script in DBFS. You can use the init script to create new clusters with encryption enabled.

Install an encryption init script notebook

Open notebook in new tab