Deploy Scala JARs on Unity Catalog clusters
This article describes how to compile and deploy Scala jobs as JAR files on a Unity Catalog-enabled cluster in standard access mode. It provides details to ensure that:
- Your Java Development Kit (JDK) version matches the JDK version on your Databricks cluster.
- Your version of Scala matches the Scala version on your Databricks cluster.
- Databricks Connect is added as a dependency and matches the version running on your Databricks cluster.
- The local project you are compiling is packaged as a single JAR and includes all dependencies. Alternatively, you can install dependencies as cluster libraries.
- All dependencies on OSS Spark, such as
spark-core
orhadoop-core
are removed. - All JARs used are added to the allowlist.
Unity Catalog clusters in standard access mode implement the new Spark Connect architecture, which separates client and server components. This separation allows you to efficiently share clusters while fully enforcing Unity Catalog governance with measures such as row filters and column masks. However, Unity Catalog clusters in standard access mode have some limitations, for example lack of support for APIs such as Spark Context and RDDs. Limtations are listed in Compute access mode limitations for Unity Catalog.
Step 1: Ensure the Scala and JDK versions match
Before building your JARs, ensure the version of the Java Development Kit (JDK) and Scala that you will used to compile your code match the versions running on the Databricks Runtime version on your cluster. For information about compatible versions, see the version support matrix.
Step 2: Add Databricks Connect as a dependency
Databricks Connect must be used to build Scala JARs instead of OSS Spark. The Spark version running on the Databricks Runtime is more recent than what is currently available on OSS Spark, and includes performance and stability improvements.
In your Scala project’s build file, such as build.sbt
for sbt or pom.xml
for Maven, add the following reference to Databricks Connect. Also, remove any dependency on OSS Spark.
- Maven
- Sbt
<dependency>
<groupId>com.databricks</groupId>
<artifactId>databricks-connect</artifactId>
<version>16.2.0</version>
</dependency>
libraryDependencies += "com.databricks" % "databricks-connect" % "16.2.+"
Step 3: Package as a single JAR and deploy
Databricks recommends packaging your application and all dependencies into a single JAR file, also known as an über or fat JAR. For sbt, use sbt-assembly
, and for Maven, use maven-shade-plugin
. See the official Maven Shade Plugin and sbt-assembly documentation for details.
Alternatively, you can install dependencies as cluster-scoped libraries. See compute-scoped libraries for more information.
Deploy your JAR file using a JAR task. See JAR task for jobs.
Step 4: Ensure your JAR is allowlisted
For security reasons, standard access mode requires an administrator to add Maven coordinates and paths for JAR libraries to an allowlist. See Allowlist libraries and init scripts on compute with standard access mode (formerly shared access mode).