Encrypt traffic between cluster worker nodes

Preview

This feature is in Public Preview.

Note

This feature requires the Enterprise plan. Contact your Databricks account representative for more information.

In a typical data processing workflow in Databricks, a user query or transformation is sent to your clusters over an encrypted channel. The data exchanged between cluster worker nodes, however, is not encrypted by default. If your environment requires that data be encrypted at all times, whether at rest or in transit, you can create an init script that configures your clusters to encrypt traffic between worker nodes, using AES 128-bit encryption over a TLS 1.2 connection.

Note

Although AES enables cryptographic routines to take advantage of hardware acceleration, there is nonetheless a performance penalty compared to unencrypted traffic. Depending on the amount of shuffle data, throughput between nodes can be decreased, resulting in queries taking longer on an encrypted cluster.

To enable encryption for traffic between worker nodes, create a cluster-scoped init script or global init script (if you want all clusters in your workspace to use worker-to-worker encryption) that sets the Spark configuration.

Get keystore file and password

The JKS keystore file used for enabling SSL/HTTPS is dynamically generated for each workspace. The password of the JKS keystore file is hardcoded and not intended to protect the confidentiality of the keystore. Do not assume that the keystore file itself is protected.

#!/bin/bash

keystore_file="$DB_HOME/keys/jetty_ssl_driver_keystore.jks"
keystore_password="gb1gQqZ9ZIHS"

# Use the SHA256 of the JKS keystore file as a SASL authentication secret string
sasl_secret=$(sha256sum $keystore_file | cut -d' ' -f1)

spark_defaults_conf="$DB_HOME/spark/conf/spark-defaults.conf"
driver_conf="$DB_HOME/driver/conf/config.conf"

if [ ! -e $spark_defaults_conf ] ; then
    touch $spark_defaults_conf
fi
if [ ! -e $driver_conf ] ; then
    touch $driver_conf
fi

Set the executor configuration

# Authenticate
echo "spark.authenticate true" >> $spark_defaults_conf
echo "spark.authenticate.secret $sasl_secret" >> $spark_defaults_conf

# Configure AES encryption
echo "spark.network.crypto.enabled true" >> $spark_defaults_conf
echo "spark.network.crypto.saslFallback false" >> $spark_defaults_conf

# Configure SSL
echo "spark.ssl.enabled true" >> $spark_defaults_conf
echo "spark.ssl.keyPassword $keystore_password" >> $spark_defaults_conf
echo "spark.ssl.keyStore $keystore_file" >> $spark_defaults_conf
echo "spark.ssl.keyStorePassword $keystore_password" >> $spark_defaults_conf
echo "spark.ssl.protocol TLSv1.2" >> $spark_defaults_conf
echo "spark.ssl.standalone.enabled true" >> $spark_defaults_conf
echo "spark.ssl.ui.enabled true" >> $spark_defaults_conf

Set the driver configuration

head -n -a ${DB_HOME}/driver/conf/spark-branch.conf > $driver_conf

# Authenticate
echo "spark.authenticate true" >> $driver_conf
echo "spark.authenticate.secret $sasl_secret" >> $driver_conf

# Configure AES encryption
echo "spark.network.crypto.enabled true" >> $driver_conf
echo "spark.network.crypto.saslFallback false" >> $driver_conf

# Configure SSL
echo "spark.ssl.enabled true" >> $driver_conf
echo "spark.ssl.keyPassword $keystore_password" >> $driver_conf
echo "spark.ssl.keyStore $keystore_file" >> $driver_conf
echo "spark.ssl.keyStorePassword $keystore_password" >> $driver_conf
echo "spark.ssl.protocol TLSv1.2" >> $driver_conf
echo "spark.ssl.standalone.enabled true" >> $driver_conf
echo "spark.ssl.ui.enabled true" >> $driver_conf

mv $driver_conf ${DB_HOME}/driver/conf/spark-branch.conf

Once the initialization of the driver and worker nodes is complete, all traffic between these nodes will be encrypted.