What are init scripts?

An init script (initialization script) is a shell script that runs during startup of each cluster node before the Apache Spark driver or executor JVM starts.

Some examples of tasks performed by init scripts include:

  • Set system properties and environment variables used by the JVM.

  • Modify Spark configuration parameters.

  • Modify the JVM system classpath in special cases.

  • Install packages and libraries not included in Databricks Runtime. To install Python packages, use the Databricks pip binary located at /databricks/python/bin/pip to ensure that Python packages install into the Databricks Python virtual environment rather than the system Python environment. For example, /databricks/python/bin/pip install <package-name>.

Databricks recommends managing all init scripts as cluster-scoped init scripts. Init scripts should be stored in Unity Catalog volumes if using compute with shared or assigned access mode. Workspace files should be used for init scripts if using compute with no-isolation shared access mode.

Init scripts are not supported on all cluster configurations and not all files can be referenced from init scripts. See Compute compatibility with libraries and init scripts and What files can I reference in an init script?.

What types of init scripts does Databricks support?

Databricks supports two kinds of init scripts: cluster-scoped and global. Databricks only recommends using cluster-scoped init scripts. You can get behavior similar to global init script by associating cluster-scoped init scripts with cluster policies.

  • Cluster-scoped: run on every cluster configured with the script. This is the recommended way to run an init script. See Use cluster-scoped init scripts.

  • Global: run on all clusters in the workspace configured with Single User access mode or no-isolation shared access mode. Not run on clusters with shared access mode. They can help you to enforce consistent cluster configurations across your workspace. Use them carefully because they can cause unanticipated impacts, like library conflicts. Only workspace admin users can create global init scripts. See Use global init scripts.

Whenever you change any type of init script, you must restart all clusters affected by the script.

Legacy init scripts

You might encounter legacy init scripts in your workspace. These init scripts are deprecated. You should migrate away from these legacy init scripts as soon as possible. For more information, see the following articles:

Init script execution order

The order of execution of init scripts is:

  1. Global

  2. Cluster-scoped


Legacy global and legacy cluster-named init scripts run before other init scripts. These init scripts might be present in workspaces created before February 21, 2023.