This article walks you through the steps required to install libraries from cloud object storage on Databricks.
This article refers to cloud object storage as a general concept, and assumes that you are directly interacting with data stored in object storage using URIs. Databricks recommends using Unity Catalog volumes to configure access to files in cloud object storage. See Create volumes.
You can store custom JAR and Python Whl libraries in cloud object storage, instead of storing them in the DBFS root.
Libraries uploaded using the library UI are stored in the DBFS root. All workspace users have the ability to modify data and files stored in the DBFS root. You can avoid this by uploading libraries to workspace files or Unity Catalog volumes, using libraries in cloud object storage or using library package repositories.
You can load libraries to object storage the same way you load other files. You must have proper permissions in your cloud provider to create new object storage containers or load files into cloud object storage.
Databricks recommends configuring all privileges related to library installation with read-only permissions.
Databricks allows you to assign security permissions to individual clusters that govern access to data in cloud object storage. These policies can be expanded to add read-only access to cloud object storage that contains libraries.
In Databricks Runtime 13.2 and below, you cannot load JAR libraries when using clusters with shared access modes. In Databricks Runtime 13.3 and above, you must add JAR libraries to the Unity Catalog allowlist. See Allowlist libraries and init scripts on shared compute.
Databricks recommends using instance profiles to manage access to libraries stored in S3. Use the following documentation in the cross-reference link to complete this setup:
Create a IAM role with read and list permissions on your desired buckets. See Configure S3 access with instance profiles.
Launch a cluster with the instance profile. See Launch a compute resource with an instance profile.
To install a library stored in cloud object storage to a cluster, complete the following steps:
Select a cluster from the list in the clusters UI.
Select the Libraries tab.
Select the File path/S3 option.
Provide the full URI path to the library object (for example,
You can use
%pip to install custom Python wheels stored in object storage scoped to a notebook-isolated SparkSession. To use this method, you must either store libraries in publicly readable object storage or use a pre-signed URL.
Jar libraries cannot be installed in the notebook. You must install Jar libraries at the cluster level.