Allowlist libraries and init scripts on shared compute

In Databricks Runtime 13.3 LTS and above, you can add libraries and init scripts to the allowlist in Unity Catalog. This allows users to leverage these artifacts on compute configured with shared access mode.

You can allowlist a directory or filepath before that directory or file exists. See Upload files to a Unity Catalog volume.

Note

You must be a metastore admin or have the MANAGE ALLOWLIST privilege to modify the allowlist. See MANAGE ALLOWLIST.

Important

Libraries used as JDBC drivers or custom Spark data sources on Unity Catalog-enabled shared compute require ANY FILE permissions.

Some installed libraries store data of all users in one common temp directory. These libraries might compromise user isolation.

How to add items to the allowlist

You can add items to the allowlist with Catalog Explorer or the REST API.

To open the dialog for adding items to the allowlist in Catalog Explorer, do the following:

  1. In your Databricks workspace, click Catalog icon Catalog.

  2. Click Gear icon to open the metastore details and permissions UI.

  3. Select Allowed JARs/Init Scripts.

  4. Click Add.

Important

This option only displays for sufficiently privileged users. If you cannot access the allowlist UI, contact your metastore admin for assistance in allowlisting libraries and init scripts.

Add an init script to the allowlist

Complete the following steps in the allowlist dialog to add an init script to the allowlist:

  1. For Type, select Init Script.

  2. For Source Type, select Volume or the object storage protocol.

  3. Specify the source path to add to the allowlist. See How are permissions on paths enforced in the allowlist?.

Add a JAR to the allowlist

Complete the following steps in the allowlist dialog to add a JAR to the allowlist:

  1. For Type, select JAR.

  2. For Source Type, select Volume or the object storage protocol.

  3. Specify the source path to add to the allowlist. See How are permissions on paths enforced in the allowlist?.

Add Maven coordinates to the allowlist

Complete the following steps in the allowlist dialog to add Maven coordinates to the allowlist:

  1. For Type, select Maven.

  2. For Source Type, select Coordinates.

  3. Enter coordinates in the following format: groudId:artifactId:version.

    • You can include all versions of a library by allowlisting the following format: groudId:artifactId.

    • You can include all artifacts in a group by allowlisting the following format: groupId.

How are permissions on paths enforced in the allowlist?

You can use the allowlist to grant access to JARs or init scripts stored in Unity Catalog volumes and object storage. If you add a path for a directory rather than a file, allowlist permissions propagate to contained files and directories.

Prefix matching is used for all artifacts stored in Unity Catalog volumes or object storage. To prevent prefix matching at a given directory level, include a trailing slash (/). For example: /Volumes/prod-libraries/.

You can define permissions at the following levels:

  1. The base path for the volume or storage container.

  2. A directory nested at any depth from the base path.

  3. A single file.

Adding a path to the allowlist only means that the path can be used for either init scripts or JAR installation. Databricks still checks for permissions to access data in the specified location.

The principal used must have READ VOLUME permissions on the specified volume. See READ VOLUME.

In single user access mode, the identity of the assigned principal (a user or service principal) is used.

In shared access mode:

  • Libraries use the identity of the library installer.

  • Init scripts use the identity of the cluster owner.

Note

No-isolation shared access mode does not support volumes, but uses the same identity assignment as shared access mode.

Databricks recommends configuring all object storage privileges related to init scripts and libraries with read-only permissions. Users with write permissions on these locations can potentially modify code in library files or init scripts.

Databricks recommends using instance profiles to manage access to JARs or init scripts stored in S3. Use the following documentation in the cross-reference link to complete this setup:

  1. Create a IAM role with read and list permissions on your desired buckets. See Tutorial: Configure S3 access with an instance profile.

  2. Launch a cluster with the instance profile. See Instance profiles.

Note

Allowlist permissions for JARs and init scripts are managed separately. If you use the same location to store both types of objects, you must add the location to the allowlist for each.