Use cluster-scoped init scripts
Cluster-scoped init scripts are init scripts defined in a cluster configuration. Cluster-scoped init scripts apply to both clusters you create and those created to run jobs.
You can configure cluster-scoped init scripts using the UI, the CLI, and by invoking the Clusters API. This section focuses on performing these tasks using the UI. For the other methods, see the Databricks CLI and the Clusters API.
You can add any number of scripts, and the scripts are executed sequentially in the order provided.
If a cluster-scoped init script returns a non-zero exit code, the cluster launch fails. You can troubleshoot cluster-scoped init scripts by configuring cluster log delivery and examining the init script log. See Init script logging.
Configure a cluster-scoped init script using the UI
This section containts instructions for configuring a cluster to run an init script using the Databricks UI.
Databricks recommends managing all init scripts as cluster-scoped init scripts. Init scripts should be stored in Unity Catalog volumes if using compute with shared or assigned access mode. Workspace files should be used for init scripts if using compute with no-isolation shared access mode.
For shared access mode, you must add init scripts to the allowlist
. See Allowlist libraries and init scripts on shared compute.
Warning
Cluster-scoped init scripts on DBFS are deprecated. The DBFS option in the UI exists to support legacy workloads and is not recommended. All init scripts stored in DBFS should be migrated to workspace files or Unity Catalog volumes. For migration instructions, see Cluster-named init script migration notebook in the Databricks Knowledge Base.
To use the UI to configure a cluster to run an init script, complete the following steps:
On the cluster configuration page, click the Advanced Options toggle.
At the bottom of the page, click the Init Scripts tab.
In the Source drop-down, select the Workspace, Volume, or S3 source type.
Specify a path to the init script, such as one of the following examples:
For an init script stored in your home directory with workspace files:
/Users/<user-name>/<script-name>.sh
.For an init script stored with Unity Catalog volumes:
/Volumes/<catalog>/<schema>/<volume>/<path-to-script>/<script-name>.sh
.For an init script stored with object storage:
s3://bucket-name/path/to/init-script
.
Click Add.
In assigned access mode, the identity of the assigned principal (a user or service principal) is used.
In shared access mode, the identity of the cluster owner is used.
Note
No-isolation shared access mode does not support volumes, but uses the same identity assignment as shared access mode.
To remove a script from the cluster configuration, click the trash icon at the right of the script. When you confirm the delete you will be prompted to restart the cluster. Optionally you can delete the script file from the location you uploaded it to.
Note
If you configure an init script using the S3 source type, you must configure access credentials.
Databricks recommends using instance profiles to manage access to init scripts stored in S3. Use the following documentation in the cross-reference link to complete this setup:
Create a IAM role with read and list permissions on your desired buckets. See Configure S3 access with instance profiles.
Launch a cluster with the instance profile. See Launch a compute resource with an instance profile.
Configure S3 region
You must specify the S3 region for the bucket containing the init script if the bucket is in a different region than your workspace. Select auto
only if your bucket and workspace share a region.
Troubleshooting cluster-scoped init scripts
The script must exist at the configured location. If the script doesn’t exist, attempts to start the cluster or scale up the executors result in failure.
The init script cannot be larger than 64KB. If a script exceeds that size, the cluster will fail to launch and a failure message will appear in the cluster log.