Skip to main content

Set Spark configuration properties on Databricks

You can set Spark configuration properties (Spark confs) to customize settings in your compute environment.

Databricks generally recommends against configuring most Spark properties. Especially when migrating from open-source Apache Spark or upgrading Databricks Runtime versions, legacy Spark configurations can override new default behaviors that optimize workloads.

For many behaviors controlled by Spark properties, Databricks also provides options to either enable behavior at a table level or to configure custom behavior as part of a write operation. For example, schema evolution was previously controlled by a Spark property, but now has coverage in SQL, Python, and Scala. See Schema evolution syntax for merge.

Configure Spark properties for notebooks and jobs

You can set Spark properties for notebooks and jobs. The scope of the configuration depends on how you set it.

Properties configured

Applies to

Using compute configuration

All notebooks and jobs run with the compute resource.

Within a notebook

Only the SparkSession for the current notebook.

For instructions on configuring Spark properties at the compute level, see Spark configuration.

To set a Spark property in a notebook, use the following syntax:

SQL
SET spark.sql.ansi.enabled = true

Configure Spark properties in Databricks SQL

Databricks SQL allows admins to configure Spark properties for data access in the workspace settings menu. See Data access configurations

Other than data access configurations, Databricks SQL only allows a handful of Spark confs, which have been aliased to shorter names for simplicity. See Configuration parameters.

For most supported SQL configurations, you can override the global behavior in your current session. The following example turns off ANSI mode:

SQL
SET ANSI_MODE = false

Configure Spark properties for Lakeflow Spark Declarative Pipelines

Lakeflow Spark Declarative Pipelines allows you to configure Spark properties for a pipeline, for one compute resource configured for a pipeline, or for individual flows, materialized views, or streaming tables.

You can set pipeline and compute Spark properties using the UI or JSON. See Configure Pipelines.

Use the spark_conf option in Lakeflow Spark Declarative Pipelines decorator functions to configure Spark properties for flows, views, or tables. See Lakeflow Spark Declarative Pipelines Python language reference.

Configure Spark properties for serverless notebooks and jobs

Serverless compute does not support setting most Spark properties for notebooks or jobs. The following are the properties you can configure:

Property

Default

Description

spark.databricks.execution.timeout

9000 (only applicable for notebooks)

The execution timeout, in seconds, for Spark Connect queries. The default value is only applicable for notebook queries. For jobs running on serverless compute (and jobs running on classic standard compute), there is no timeout unless this property is set.

spark.sql.legacy.timeParserPolicy

CORRECTED

The time parser policy.

spark.sql.session.timeZone

Etc/UTC

The ID of session local timezone in the format of either region-based zone IDs or zone offsets.

spark.sql.shuffle.partitions

auto

The default number of partitions to use when shuffling data for joins or aggregations.

spark.sql.ansi.enabled

true

When true, Spark SQL uses an ANSI compliant dialect instead of being Hive compliant.

spark.sql.files.maxPartitionBytes

134217728 (128 MB)

The maximum number of bytes to pack into a single partition when reading files.

Unsupported Spark properties

The following Spark configuration properties are not supported in Databricks. Unsupported Spark properties are either ignored by Databricks or may cause conflicts and failures when used simultaneously with Databricks features. If you are migrating workloads to Databricks, replace unsupported properties with the recommended alternatives.

Unsupported Spark properties

Applies to

Databricks alternative

spark.dynamicAllocation.enabled

spark.dynamicAllocation.initialExecutors

spark.dynamicAllocation.minExecutors

spark.dynamicAllocation.maxExecutors

spark.dynamicAllocation.executorIdleTimeout

Classic compute

Configure Databricks autoscaling instead, which manages executor lifecycle at the platform level. See Enable autoscaling.

spark.master

spark.driver.host

spark.driver.port

Serverless compute and Lakeflow Spark Declarative Pipelines

The Databricks serverless infrastructure manages these internal connection properties automatically. They cannot be set by users. Setting them on serverless compute or Lakeflow Spark Declarative Pipelines pipelines results in an error.

spark.jars

Serverless compute and Lakeflow Spark Declarative Pipelines

Databricks does not support attaching JARs to serverless compute or Lakeflow Spark Declarative Pipelines pipelines using Spark configurations, but you can run serverless JAR tasks. See Configure environment for job tasks.

spark.databricks.runtimeoptions.*

Classic compute

Use the runtime_options attribute in the cluster configuration instead. Runtime options cannot be set as Spark configuration on any cluster type. Attempting to set these using Spark configurations results in an error.

Get the current setting for a Spark configuration

Use the following syntax to review the current setting of a Spark configuration:

Python
spark.conf.get("configuration_name")