You can use libraries to upload code from a third-party or that you developed and built on your local machine. Libraries can be in Python, Java, or Scala. Libraries are visible in your workspace for users to be able to attach to clusters. To see your libraries you can navigate to the directory to which you uploaded them in your workspace. In addition to the UI, you can create and manage libraries with the Library API. Please see our REST API for more details.
You can import Python, Java, and Scala libraries to run on Spark Clusters, or point to external packages in Maven or PyPI.
To import a library:
- Right-click on the folder in which you want to store the library
- Select “Create”
- Select “Library”
Now you can either upload a library or import it from Maven, PyPI, or Spark Packages.
To upload a Java or Scala jar:
- Select “Upload Java/Scala JAR” as the Source
- Enter a name for the “Library Name”
- Click and drag your jar to “JAR File” window
- Then, click “Create Library”
You can also upload a Python Egg.
- Enter a Library Name.
- Drop the egg and optionally the documentation egg
- Enter “Create Library”
- Now, you should see the library in the Workspace folder
Databricks saves libraries that you upload in The FileStore.
After you upload the jar and attach it to a cluster , you’ll have to reattach your notebook in order to use the library.
Libraries From Maven, PyPI, or Spark Packages¶
- Select Maven Coordinate or Spark Packages as the source.
- Enter the maven coordinate of the library you would like to install.
- Maven Coordinates are in the form
- Optionally, you can “Search Spark Packages and Maven Central” if you don’t know the exact coordinate.
- Click Create Library. The dependencies will resolve and the library will install in a couple of minutes.
Libraries from PyPI¶
- Enter the PyPI Name
- Click “Install Library”
- Now, you should see the PyPI library in the Workspace folder
After upload, libraries are immutable. They can only be overwritten or deleted.
To delete a library, navigate to it’s location in the Workspace and select the menu dropdown button on the right side and select delete.
you must restart a cluster to fully delete a library.
Some libraries are more difficult to install because they require lower level configuration. You can write a custom UNIX script to install these via Init Scripts of via SSH Access to Clusters. see Init Scripts for more details.
You can also setup a custom maven URL as well as exclude certain dependencies. In order to do that see the instructions below.
- Enter the Repository URL if your coordinate is in a different Maven repository, e.g.,
- In the Excludes box, provide the groupId and the artifactId of the dependencies that you wish to exclude, e.g.,