Create a Databricks compatible JAR

A Java archive (JAR) packages Java or Scala code for deployment in Lakeflow Jobs. This article covers JAR compatibility requirements and project configuration for different compute types.

tip

For automated deployment and continuous integration workflows, use Databricks Asset Bundles to create a project from a template with pre-configured build and deployment settings. See Build a Scala JAR using Databricks Asset Bundles and Bundle that uploads a JAR file to Unity Catalog. This article describes the manual approach for understanding JAR requirements and custom configurations.

At a high level, your JAR must meet the following requirements for compatibility:

Match versions: Use the same Java Development Kit (JDK), Scala, and Spark versions as your compute
Provide dependencies: Include required libraries in your JAR or install them on your compute
Use the Databricks Spark session: Call SparkSession.builder().getOrCreate() to access the session
Allowlist your JAR (standard compute only): Add your JAR to the allowlist

Beta

Serverless Scala and Java jobs are in Beta. You can use JAR tasks to deploy your JAR. See Manage Databricks previews if it's not already enabled.

Compute architecture

Serverless and standard compute use Spark Connect architecture to isolate user code and enforce Unity Catalog governance. Databricks Connect provides access to Spark Connect APIs. Serverless and standard compute don't support Spark Context or Spark RDD APIs. See serverless limitations and standard access mode limitations.

Dedicated compute uses the classic Spark architecture and provides access to all Spark APIs.

Find your JDK, Scala, and Spark versions

Match JDK, Scala, and Spark versions running on your compute

When you build a JAR, your JDK, Scala, and Spark versions must match the versions running on your compute. These three versions are interconnected - the Spark version determines the compatible Scala version, and both depend on a specific JDK version.

Follow these steps to find the correct versions for your compute type:

Serverless
Standard
Dedicated

Use serverless environment version 4 or higher
Find the Databricks Connect version for your environment in the serverless environment versions table. The Databricks Connect version corresponds to your Spark version.
Look up the matching JDK, Scala, and Spark versions in the version support matrix

note

Using mismatched JDK, Scala, or Spark versions may cause unexpected behavior or prevent your code from running.

Project setup

Once you know your version requirements, configure your build files and package your JAR.

Set JDK and Scala versions

Configure your build file to use the correct JDK and Scala versions. The following examples show the versions for Databricks Runtime 17.3 LTS and serverless environment version 4-scala-preview.

Sbt
Maven

In build.sbt:

Scala
scalaVersion := "2.13.16"

javacOptions ++= Seq("-source", "17", "-target", "17")

In pom.xml:

XML
<properties>
  <scala.version>2.13.16</scala.version>
  <scala.binary.version>2.13</scala.binary.version>
  <maven.compiler.source>17</maven.compiler.source>
  <maven.compiler.target>17</maven.compiler.target>
</properties>

Spark dependencies

Add a Spark dependency to access Spark APIs without packaging Spark in your JAR.

Serverless
Standard access mode
Dedicated access mode

Use Databricks Connect

Add a dependency on Databricks Connect (recommended). The Databricks Connect version must match the Databricks Connect version in your serverless environment. Mark it as provided because it's included in the runtime. Don't include Apache Spark dependencies like spark-core or other org.apache.spark artifacts in your build file. Databricks Connect provides all necessary Spark APIs.

Maven pom.xml:

XML
<dependency>
  <groupId>com.databricks</groupId>
  <artifactId>databricks-connect_2.13</artifactId>
  <version>17.0.2</version>
  <scope>provided</scope>
</dependency>

sbt build.sbt:

Scala
libraryDependencies += "com.databricks" %% "databricks-connect" % "17.0.2" % "provided"

Alternative: spark-sql-api

You can compile against spark-sql-api instead of Databricks Connect, but Databricks recommends using Databricks Connect because the Spark APIs running on serverless compute may differ slightly from open-source Spark. These libraries are included in the runtime, so mark them as provided.

Maven pom.xml:

XML
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-sql-api</artifactId>
  <version>4.0.1</version>
  <scope>provided</scope>
</dependency>

sbt build.sbt:

Scala
libraryDependencies += "org.apache.spark" %% "spark-sql-api" % "4.0.1" % "provided"

Use Databricks Connect

Add a dependency on Databricks Connect (recommended). The Databricks Connect version must match the major and minor Databricks Runtime version of your cluster (for example, Databricks Runtime 17.3 → databricks-connect 17.x). Mark it as provided because it's included in the runtime. Don't include Apache Spark dependencies like spark-core or other org.apache.spark artifacts in your build file. Databricks Connect provides all necessary Spark APIs.

Maven pom.xml:

XML
<dependency>
  <groupId>com.databricks</groupId>
  <artifactId>databricks-connect_2.13</artifactId>
  <version>17.0.2</version>
  <scope>provided</scope>
</dependency>

sbt build.sbt:

Scala
libraryDependencies += "com.databricks" %% "databricks-connect" % "17.0.2" % "provided"

Alternative: spark-sql-api

Maven pom.xml:

XML
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-sql-api</artifactId>
  <version>4.0.1</version>
  <scope>provided</scope>
</dependency>

sbt build.sbt:

Scala
libraryDependencies += "org.apache.spark" %% "spark-sql-api" % "4.0.1" % "provided"

Use Databricks Connect or Spark APIs

Add a dependency on Databricks Connect (recommended) or compile against Spark libraries with scope provided.

Option 1: databricks-connect (recommended)

Mark it as provided because it's included in the runtime.

Maven pom.xml:

XML
<dependency>
  <groupId>com.databricks</groupId>
  <artifactId>databricks-connect_2.13</artifactId>
  <version>17.0.2</version>
  <scope>provided</scope>
</dependency>

sbt build.sbt:

Scala
libraryDependencies += "com.databricks" %% "databricks-connect" % "17.0.2" % "provided"

Option 2: spark-sql-api

You can compile against spark-sql-api, but this isn't recommended because the version on Databricks might differ slightly. These libraries are included in the runtime, so mark them as provided.

Maven pom.xml:

XML
<dependency>
  <groupId>org.apache.spark</groupId>
  <artifactId>spark-sql-api</artifactId>
  <version>4.0.1</version>
  <scope>provided</scope>
</dependency>

sbt build.sbt:

Scala
libraryDependencies += "org.apache.spark" %% "spark-sql-api" % "4.0.1" % "provided"

Option 3: Other Spark libraries

You can use any Apache Spark library (spark-core, spark-sql, etc.) with scope provided, as long as the version matches the Spark version running on your cluster. Find your cluster's Spark version in the System environment section of the Databricks Runtime release notes.

Application dependencies

Add your application's required libraries to your build file. How you manage these depends on your compute type:

Serverless
Standard access mode
Dedicated access mode

Serverless compute provides Databricks Connect and a limited set of dependencies (see release notes). Package all other libraries in your JAR using sbt-assembly or Maven Shade Plugin, or add them to your serverless environment.

For example, to package a library in your JAR:

Maven pom.xml:

XML
<dependency>
  <groupId>io.circe</groupId>
  <artifactId>circe-core_2.13</artifactId>
  <version>0.14.10</version>
</dependency>

sbt build.sbt:

Scala
libraryDependencies += "io.circe" %% "circe-core" % "0.14.10"

The Databricks Runtime includes many common libraries beyond Spark. Find the complete list of provided libraries and versions in the System Environment section of the Databricks Runtime release notes for your Databricks Runtime version (for example, Databricks Runtime 17.3 LTS).

For libraries provided by Databricks Runtime, add them as dependencies with scope provided. For example, in Databricks Runtime 17.3 LTS, protobuf-java is provided:

Maven pom.xml:

XML
<dependency>
  <groupId>com.google.protobuf</groupId>
  <artifactId>protobuf-java</artifactId>
  <version>3.25.5</version>
  <scope>provided</scope>
</dependency>

sbt build.sbt:

Scala
libraryDependencies += "com.google.protobuf" % "protobuf-java" % "3.25.5" % "provided"

For libraries not provided by Databricks Runtime, either package them in your JAR using sbt-assembly or Maven Shade Plugin, or install them as compute-scoped libraries.

For libraries provided by Databricks Runtime, add them as dependencies with scope provided. For example, in Databricks Runtime 17.3 LTS, protobuf-java is provided:

Maven pom.xml:

XML
<dependency>
  <groupId>com.google.protobuf</groupId>
  <artifactId>protobuf-java</artifactId>
  <version>3.25.5</version>
  <scope>provided</scope>
</dependency>

sbt build.sbt:

Scala
libraryDependencies += "com.google.protobuf" % "protobuf-java" % "3.25.5" % "provided"

For libraries not provided by Databricks Runtime, either package them in your JAR using sbt-assembly or Maven Shade Plugin, or install them as compute-scoped libraries.

Code requirements

When writing your JAR code, follow these patterns to ensure compatibility with Databricks jobs.

Use the Databricks Spark session

When running a JAR in a job, you must use the Spark session provided by Databricks. The following code shows how to access the session from your code:

Java
Scala

Java
SparkSession spark = SparkSession.builder().getOrCreate();

Scala
val spark = SparkSession.builder().getOrCreate()

Use `try-finally` blocks for job cleanup

If you want code that reliably runs at the end of your job, for example, to clean up temporary files, use a try-finally block. Don't use a shutdown hook, because these don't run reliably in jobs.

Consider a JAR that consists of two parts:

jobBody() which contains the main part of the job.
jobCleanup() which must run after jobBody(), whether that function succeeds or returns an exception.

For example, jobBody() creates tables and jobCleanup() drops those tables.

The safe way to ensure that the clean-up method is called is to put a try-finally block in the code:

Scala
try {
  jobBody()
} finally {
  jobCleanup()
}

Don't try to clean up using sys.addShutdownHook(jobCleanup) or the following code:

Scala
// Do NOT clean up with a shutdown hook like this. This will fail.
val cleanupThread = new Thread { override def run = jobCleanup() }
Runtime.getRuntime.addShutdownHook(cleanupThread)

Databricks manages Spark container lifetimes in a way that prevents shutdown hooks from running reliably.

Read job parameters

Databricks passes parameters to your JAR job as a JSON string array. To access these parameters, inspect the String array passed into your main function.

For more details on parameters, see Parameterize jobs.

Additional configuration

Depending on your compute type, you might need additional configuration:

Standard access mode: For security reasons, an administrator must add Maven coordinates and paths for JAR libraries to an allowlist.
Serverless compute: If your job accesses private resources (databases, APIs, storage), configure networking with a Network Connectivity Configuration (NCC). See Serverless network security.

Next steps

Learn how to use a jar in a job.
Learn about Databricks Connect.
Learn about Scala in Databricks.

Compute architecture​

Find your JDK, Scala, and Spark versions​

Project setup​

Set JDK and Scala versions​

Spark dependencies​

Application dependencies​

Code requirements​

Use the Databricks Spark session​

Use try-finally blocks for job cleanup​

Read job parameters​

Additional configuration​

Next steps​

Compute architecture

Find your JDK, Scala, and Spark versions

Project setup

Set JDK and Scala versions

Spark dependencies

Application dependencies

Code requirements

Use the Databricks Spark session

Use `try-finally` blocks for job cleanup

Read job parameters

Additional configuration

Next steps