Install libraries from object storage
This article walks you through the steps required to install libraries from cloud object storage on Databricks.
Note
This article refers to cloud object storage as a general concept, and assumes that you are directly interacting with data stored in object storage using URIs. Databricks recommends using Unity Catalog volumes to configure access to files in cloud object storage. See What are Unity Catalog volumes?.
You can store custom JAR and Python Whl libraries in cloud object storage, instead of storing them in the DBFS root. See Cluster-scoped libraries for full library compatibility details.
Important
Libraries can be installed from DBFS when using Databricks Runtime 14.3 LTS and below. However, any workspace user can modify library files stored in DBFS. To improve the security of libraries in a Databricks workspace, storing library files in the DBFS root is deprecated and disabled by default in Databricks Runtime 15.1 and above. See Storing libraries in DBFS root is deprecated and disabled by default.
Instead, Databricks recommends uploading all libraries, including Python libraries, JAR files, and Spark connectors, to workspace files or Unity Catalog volumes, or using library package repositories. If your workload does not support these patterns, you can also use libraries stored in cloud object storage.
Load libraries to object storage
You can load libraries to object storage the same way you load other files. You must have proper permissions in your cloud provider to create new object storage containers or load files into cloud object storage.
Grant read-only permissions to object storage
Databricks recommends configuring all privileges related to library installation with read-only permissions.
Databricks allows you to assign security permissions to individual clusters that govern access to data in cloud object storage. These policies can be expanded to add read-only access to cloud object storage that contains libraries.
Note
In Databricks Runtime 12.2 LTS and below, you cannot load JAR libraries when using clusters with shared access modes. In Databricks Runtime 13.3 LTS and above, you must add JAR libraries to the Unity Catalog allowlist. See Allowlist libraries and init scripts on shared compute.
Databricks recommends using instance profiles to manage access to libraries stored in S3. Use the following documentation in the cross-reference link to complete this setup:
Create a IAM role with read and list permissions on your desired buckets. See Tutorial: Configure S3 access with an instance profile.
Launch a cluster with the instance profile. See Instance profiles.
Install libraries to clusters
To install a library stored in cloud object storage to a cluster, complete the following steps:
Select a cluster from the list in the clusters UI.
Select the Libraries tab.
Select the File path/S3 option.
Provide the full URI path to the library object (for example,
s3://bucket-name/path/to/library.whl
).Click Install.
Install libraries to notebooks
You can use %pip
to install custom Python wheel files stored in object storage scoped to a notebook-isolated SparkSession. To use this method, you must either store libraries in publicly readable object storage or use a pre-signed URL.
See Notebook-scoped Python libraries.
Note
JAR libraries cannot be installed in the notebook. You must install JAR libraries at the cluster level.