What are init scripts?

An init script (initialization script) is a shell script that runs during startup of each cluster node before the Apache Spark driver or executor JVM starts. This article provides recommendations for init scripts and configuration information if you must use them.

Recommendations for init scripts

Databricks recommends using built-in platform features instead of init scripts whenever possible. Widespread use of init scripts can slow migration to new Databricks Runtime versions and prevent adoption of some Databricks optimizations.

Important

If you need to migrate from init scripts on DBFS, see Migrate init scripts from DBFS.

The following Databricks features address some of the common use cases for init scripts:

If you must use init scripts:

  • Manage init scripts using compute policies or cluster-scoped init scripts rather than global init scripts. See init script types.

  • Manage library installation for production and interactive environments using compute policies. Don’t install libraries using init scripts.

  • Use shared access mode for all workloads. Only use the single user access mode if required functionality is not supported by shared access mode.

  • Use new Databricks Runtime versions and Unity Catalog for all workloads.

The following table provides recommendations organized by Databricks Runtime version and Unity Catalog enablement.

Environment

Recommendation

Databricks Runtime 13.3 LTS and above with Unity Catalog

Store init scripts in Unity Catalog volumes.

Databricks Runtime 11.3 LTS and above without Unity Catalog

Store init scripts as workspace files. (File size limit is 500 MB).

Databricks Runtime 10.4 LTS and below

Store init scripts using cloud object storage.

What types of init scripts does Databricks support?

Databricks supports two kinds of init scripts: cluster-scoped and global, but using cluster-scoped init scripts are recommended.

  • Cluster-scoped: run on every cluster configured with the script. This is the recommended way to run an init script. See Use cluster-scoped init scripts.

  • Global: run on all clusters in the workspace configured with single user access mode or no-isolation shared access mode. These init scripts can cause unexpected issues, such as library conflicts. Only workspace admin users can create global init scripts. See Use global init scripts.

Whenever you change any type of init script, you must restart all clusters affected by the script.

Global init-scripts run before cluster-scoped init scripts.

Important

Legacy global and legacy cluster-named init scripts run before other init scripts. These init scripts are end-of-life, but might be present in workspaces created before February 21, 2023. See Cluster-named init scripts (legacy) and Global init scripts (legacy).

Where can init scripts be installed?

You can store and configure init scripts from workspace files, Unity Catalog volumes, and cloud object storage, but init scripts are not supported on all cluster configurations and not all files can be referenced from init scripts.

The following table indicates the support for init scripts based on the source location and the cluster access mode. The Databricks Runtime version listed is the minimum version required to use the combination. For information about cluster access modes, see Access modes.

Note

Shared access mode requires an admin to add init scripts to an allowlist. See Allowlist libraries and init scripts on shared compute.

Shared access mode

Single access mode

No-isolation shared access mode

Workspace files

Not supported

All supported Databricks Runtime versions

All supported Databricks Runtime versions

Volumes

13.3 LTS

13.3 LTS

Not supported

Cloud storage

13.3 LTS

All supported Databricks Runtime versions

All supported Databricks Runtime versions

Migrate init scripts from DBFS

Users that need to migrate init scripts from DBFS can use the following guides. Make sure you’ve identified the correct target for your configuration. See Recommendations for init scripts.