Encrypt traffic between cluster worker nodes

Preview

This feature is in Public Preview.

Note

This feature requires the Enterprise plan. Contact your Databricks account representative for more information.

User queries and transformations are typically sent to your clusters over an encrypted channel. By default, however, the data exchanged between worker nodes in a cluster is not encrypted. If your environment requires that data be encrypted at all times, whether at rest or in transit, you can create an init script that configures your clusters to encrypt traffic between worker nodes, using AES 128-bit encryption over a TLS 1.2 connection.

Note

Although AES enables cryptographic routines to take advantage of hardware acceleration, there’s a performance penalty compared to unencrypted traffic. This penalty can result in queries taking longer on an encrypted cluster, depending on the amount of data shuffled between nodes.

Enabling encryption of traffic between worker nodes requires setting Spark configuration parameters through an init script. You can use a cluster-scoped init script for a single cluster or a global init script if you want all clusters in your workspace to use worker-to-worker encryption. The init script must perform the following tasks:

  1. Get the JKS keystore file and password.
  2. Set the Spark executor configuration.
  3. Set the Spark driver configuration.

Note

The JKS keystore file used for enabling SSL/HTTPS is dynamically generated for each workspace. The JKS keystore file’s password is hardcoded and not intended to protect the confidentiality of the keystore.

The following is an example init script that implements these three tasks to generate the cluster encryption configuration.

#!/bin/bash

keystore_file="$DB_HOME/keys/jetty_ssl_driver_keystore.jks"
keystore_password="gb1gQqZ9ZIHS"

# Use the SHA256 of the JKS keystore file as a SASL authentication secret string
sasl_secret=$(sha256sum $keystore_file | cut -d' ' -f1)

spark_defaults_conf="$DB_HOME/spark/conf/spark-defaults.conf"
driver_conf="$DB_HOME/driver/conf/config.conf"

if [ ! -e $spark_defaults_conf ] ; then
    touch $spark_defaults_conf
fi
if [ ! -e $driver_conf ] ; then
    touch $driver_conf
fi

# Authenticate
echo "spark.authenticate true" >> $spark_defaults_conf
echo "spark.authenticate.secret $sasl_secret" >> $spark_defaults_conf

# Configure AES encryption
echo "spark.network.crypto.enabled true" >> $spark_defaults_conf
echo "spark.network.crypto.saslFallback false" >> $spark_defaults_conf

# Configure SSL
echo "spark.ssl.enabled true" >> $spark_defaults_conf
echo "spark.ssl.keyPassword $keystore_password" >> $spark_defaults_conf
echo "spark.ssl.keyStore $keystore_file" >> $spark_defaults_conf
echo "spark.ssl.keyStorePassword $keystore_password" >> $spark_defaults_conf
echo "spark.ssl.protocol TLSv1.2" >> $spark_defaults_conf
echo "spark.ssl.standalone.enabled true" >> $spark_defaults_conf
echo "spark.ssl.ui.enabled true" >> $spark_defaults_conf

head -n -1 ${DB_HOME}/driver/conf/spark-branch.conf > $driver_conf
echo " // Authenticate">> $driver_conf
echo " \"spark.authenticate\" = true" >> $driver_conf
echo " \"spark.authenticate.secret\" = \"$sasl_secret\"" >> $driver_conf
echo " // Configure AES encryption">> $driver_conf
echo " \"spark.network.crypto.enabled\" = true" >> $driver_conf
echo " \"spark.network.crypto.saslFallback\" = false" >> $driver_conf
echo " // Configure SSL">> $driver_conf
echo " \"spark.ssl.enabled\" = true" >> $driver_conf
echo " \"spark.ssl.keyPassword\" = \"$keystore_password\"" >> $driver_conf
echo " \"spark.ssl.keyStore\" = \"$keystore_file\"" >> $driver_conf
echo " \"spark.ssl.keyStorePassword\" = \"$keystore_password\"" >> $driver_conf
echo " \"spark.ssl.protocol\" = \"TLSv1.2\"" >> $driver_conf
echo " \"spark.ssl.standalone.enabled\" = true" >> $driver_conf
echo " \"spark.ssl.ui.enabled\" = true" >> $driver_conf
echo " }" >> $driver_conf
mv $driver_conf ${DB_HOME}/driver/conf/spark-branch.conf

Once the initialization of the driver and worker nodes is complete, all traffic between these nodes will be encrypted.

This following notebook provides an example of generating the init script in DBFS. You can then use this init script to create new encryption-enabled clusters.

Install an encryption init script notebook

Open notebook in new tab