Spark Configuration

Spark configuration properties

To fine tune Spark jobs, you can provide custom Spark configuration properties in a cluster configuration.

  1. On the cluster configuration page, click the Advanced Options toggle.

  2. Click the Spark tab.

    ../../_images/spark-config-aws.png

When you configure a cluster using the Clusters API, set Spark properties in the spark_conf field in the Create cluster request or Edit cluster request.

To set Spark properties for all clusters, create a global init script:

%scala
dbutils.fs.put("dbfs:/databricks/init/set_spark_params.sh","""
#!/bin/bash
sudo echo "spark.sql.sources.partitionOverwriteMode DYNAMIC" >> /databricks/spark/conf/spark-defaults.conf
""", true)

Environment variables

You can set environment variables that you can access from scripts running on a cluster. Set environment variables in the spark_env_vars field in the Create cluster request or Edit cluster request.

../../_images/environment-variables.png

Note

The environment variables you set in this field are not available in Cluster Node Initialization Scripts. Init scripts support only a limited set of pre-defined Environment variables.