Deploy Scala JARs on Unity Catalog clusters

This article describes how to compile and deploy Scala jobs as JAR files on a Unity Catalog-enabled cluster in standard access mode. It provides details to ensure that:

Your Java Development Kit (JDK) version matches the JDK version on your Databricks cluster.
Your version of Scala matches the Scala version on your Databricks cluster.
Databricks Connect is added as a dependency and matches the version running on your Databricks cluster.
The local project you are compiling is packaged as a single JAR and includes all dependencies. Alternatively, you can install dependencies as cluster libraries.
All dependencies on OSS Spark, such as spark-core or hadoop-core are removed.
All JARs used are added to the allowlist.

note

Unity Catalog clusters in standard access mode implement the new Spark Connect architecture, which separates client and server components. This separation allows you to efficiently share clusters while fully enforcing Unity Catalog governance with measures such as row filters and column masks. However, Unity Catalog clusters in standard access mode have some limitations, for example lack of support for APIs such as Spark Context and RDDs. Limtations are listed in Compute access mode limitations for Unity Catalog.

Step 1: Ensure the Scala and JDK versions match

Before building your JARs, ensure the version of the Java Development Kit (JDK) and Scala that you will used to compile your code match the versions running on the Databricks Runtime version on your cluster. For information about compatible versions, see the version support matrix.

Step 2: Add Databricks Connect as a dependency

Databricks Connect must be used to build Scala JARs instead of OSS Spark. The Spark version running on the Databricks Runtime is more recent than what is currently available on OSS Spark, and includes performance and stability improvements.

In your Scala project's build file, such as build.sbt for sbt or pom.xml for Maven, add the following reference to Databricks Connect. Also, remove any dependency on OSS Spark.

Maven
Sbt

<dependency>
  <groupId>com.databricks</groupId>
  <artifactId>databricks-connect</artifactId>
  <version>16.2.0</version>
</dependency>

libraryDependencies += "com.databricks" % "databricks-connect" % "16.2.+"

Step 3: Package as a single JAR and deploy

Databricks recommends packaging your application and all dependencies into a single JAR file, also known as an über or fat JAR. For sbt, use sbt-assembly, and for Maven, use maven-shade-plugin. See the official Maven Shade Plugin and sbt-assembly documentation for details.

Alternatively, you can install dependencies as cluster-scoped libraries. See compute-scoped libraries for more information.

note

For Scala JARs installed as libraries on Unity Catalog standard clusters, classes in the JAR libraries must be in a named package, such as com.databricks.MyClass, or errors will occur when importing the library.

Deploy your JAR file using a JAR task. See JAR task for jobs.

Step 4: Ensure your JAR is allowlisted

For security reasons, standard access mode requires an administrator to add Maven coordinates and paths for JAR libraries to an allowlist. See Allowlist libraries and init scripts on compute with standard access mode (formerly shared access mode).

Step 1: Ensure the Scala and JDK versions match​

Step 2: Add Databricks Connect as a dependency​

Step 3: Package as a single JAR and deploy​

Step 4: Ensure your JAR is allowlisted​

Step 1: Ensure the Scala and JDK versions match

Step 2: Add Databricks Connect as a dependency

Step 3: Package as a single JAR and deploy

Step 4: Ensure your JAR is allowlisted