Libraries

You can use libraries to upload code from a third-party or that you developed and built on your local machine. Libraries can be in Python, Java, or Scala. Libraries are visible in your workspace for users to be able to attach to clusters. To see your libraries you can navigate to the directory to which you uploaded them in your workspace. In addition to the UI, you can create and manage libraries with the Library API. Please see our REST API for more details.

Creating Libraries

You can import Python, Java, and Scala libraries to run on Spark Clusters, or point to external packages in Maven or PyPI.

To import a library:

  • Right-click on the folder in which you want to store the library
  • Select “Create”
  • Select “Library”

new Library

Now you can either upload a library or import it from Maven, PyPI, or Spark Packages.

Uploading Libraries

To upload a Java or Scala jar:

  • Select “Upload Java/Scala JAR” as the Source
  • Enter a name for the “Library Name”
  • Click and drag your jar to “JAR File” window
  • Then, click “Create Library”

UploadJar

You can also upload a Python Egg.

  • Enter a Library Name.
  • Drop the egg and optionally the documentation egg
  • Enter “Create Library”
  • Now, you should see the library in the Workspace folder

Databricks saves libraries that you upload in The FileStore.

Note

After you upload the jar and attach it to a cluster , you’ll have to reattach your notebook in order to use the library.

Libraries From Maven, PyPI, or Spark Packages

  • Select Maven Coordinate or Spark Packages as the source.
../_images/maven-library.png
  • Enter the maven coordinate of the library you would like to install.
  • Maven Coordinates are in the form groupId:artifactId:version, e.g., com.databricks:spark-avro_2.10:1.0.0
  • Optionally, you can “Search Spark Packages and Maven Central” if you don’t know the exact coordinate.
  • Click Create Library. The dependencies will resolve and the library will install in a couple of minutes.

Libraries from PyPI

  • Enter the PyPI Name
  • Click “Install Library”
  • Now, you should see the PyPI library in the Workspace folder

Editing Libraries

After upload, libraries are immutable. They can only be overwritten or deleted.

Deleting Libraries

To delete a library, navigate to it’s location in the Workspace and select the menu dropdown button menu dropdown on the right side and select delete.

Note

you must restart a cluster to fully delete a library.

Advanced Options

Some libraries are more difficult to install because they require lower level configuration. You can write a custom UNIX script to install these via Init Scripts of via SSH Access to Clusters. see Init Scripts for more details.

You can also setup a custom maven URL as well as exclude certain dependencies. In order to do that see the instructions below.

  • Enter the Repository URL if your coordinate is in a different Maven repository, e.g., https://oss.sonatype.org/content/repositories
  • In the Excludes box, provide the groupId and the artifactId of the dependencies that you wish to exclude, e.g., log4j:log4j