Create and run Scala and Java JARs on serverless compute
Serverless Scala and Java jobs are in Beta. You can use JAR tasks to deploy your JAR. See Manage Databricks previews if it's not already enabled.
A Java archive (JAR) packages Java or Scala code into a single file. This article shows you how to create a JAR with Spark code and deploy it as a Lakeflow Job on serverless compute.
For automated deployment and continuous integration workflows, use Databricks Asset Bundles to create a project from a template with pre-configured build and deployment settings. See Build a Scala JAR using Databricks Asset Bundles and Bundle that uploads a JAR file to Unity Catalog. This article describes the manual approach for deployments or learning how JARs work with serverless compute.
Requirements
Your local development environment must have the following:
- sbt 1.11.7 or higher (for Scala JARs)
- Maven 3.9.0 or higher (for Java JARs)
- JDK, Scala, and Databricks Connect versions that match your serverless environment (this example uses JDK 17, Scala 2.13.16, and Databricks Connect 17.0.1)
Step 1. Build a JAR
- Scala
- Java
-
Run the following command to create a new Scala project:
Bashsbt new scala/scala-seed.g8When prompted, enter a project name, for example,
my-spark-app. -
Replace the contents of your
build.sbtfile with the following:scalaVersion := "2.13.16"
libraryDependencies += "com.databricks" %% "databricks-connect" % "17.0.1"
// other dependencies go here...
// to run with new jvm options, a fork is required otherwise it uses same options as sbt process
fork := true
javaOptions += "--add-opens=java.base/java.nio=ALL-UNNAMED" -
Edit or create a
project/assembly.sbtfile, and add this line:addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.3.1") -
Create your main class in
src/main/scala/example/DatabricksExample.scala:Scalapackage com.examples
import org.apache.spark.sql.SparkSession
object SparkJar {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder().getOrCreate()
// Prints the arguments to the class, which
// are job parameters when run as a job:
println(args.mkString(", "))
// Shows using spark:
println(spark.version)
println(spark.range(10).limit(3).collect().mkString(" "))
}
} -
To build your JAR file, run the following command:
Bashsbt assembly
-
Run the following commands to create a new Maven project structure:
Bash# Create all directories at once
mkdir -p my-spark-app/src/main/java/com/examples
cd my-spark-app -
Create a
pom.xmlfile in the project root with the following contents:XML<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.examples</groupId>
<artifactId>my-spark-app</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>17</maven.compiler.source>
<maven.compiler.target>17</maven.compiler.target>
<scala.binary.version>2.13</scala.binary.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<!-- Databricks Connect for Spark -->
<dependency>
<groupId>com.databricks</groupId>
<artifactId>databricks-connect_${scala.binary.version}</artifactId>
<version>17.0.1</version>
</dependency>
</dependencies>
<build>
<plugins>
<!-- Maven Shade Plugin - Creates fat JAR with all dependencies -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>3.6.1</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
<mainClass>com.examples.SparkJar</mainClass>
</transformer>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project> -
Create your main class in
src/main/java/com/examples/SparkJar.java:Javapackage com.examples;
import org.apache.spark.sql.SparkSession;
import java.util.stream.Collectors;
public class SparkJar {
public static void main(String[] args) {
SparkSession spark = SparkSession.builder().getOrCreate();
// Prints the arguments to the class, which
// are job parameters when run as a job:
System.out.println(String.join(", ", args));
// Shows using spark:
System.out.println(spark.version());
System.out.println(
spark.range(10).limit(3).collectAsList().stream()
.map(Object::toString)
.collect(Collectors.joining(" "))
);
}
} -
To build your JAR file, run the following command:
Bashmvn clean packageThe compiled JAR is located in the
target/directory asmy-spark-app-1.0-SNAPSHOT.jar.
Step 2. Create a job to run the JAR
-
In your workspace, click
Jobs & Pipelines in the sidebar.
-
Click Create, then Job.
The Tasks tab displays with the empty task pane.
noteIf the Lakeflow Jobs UI is ON, click the JAR tile to configure the first task. If the JAR tile is not available, click Add another task type and search for JAR.
-
Optionally, replace the name of the job, which defaults to
New Job <date-time>, with your job name. -
In Task name, enter a name for the task, for example
JAR_example. -
If necessary, select JAR from the Type drop-down menu.
-
For Main class, enter the package and class of your Jar. If you followed the example above, enter
com.examples.SparkJar. -
For Compute, select Serverless.
-
Configure the serverless environment:
- Choose an environment, then click
Edit to configure it.
- Select 4-scala-preview for the Environment version.
- Add your JAR file by dragging and dropping it into the file selector, or browse to select it from a Unity Catalog volume or workspace location.
- Choose an environment, then click
-
For Parameters, for this example, enter
["Hello", "World!"]. -
Click Create task.
Step 3: Run the job and view the job run details
Click to run the workflow. To view details for the run, click View run in the Triggered run pop-up or click the link in the Start time column for the run in the job runs view.
When the run completes, the output displays in the Output panel, including the arguments passed to the task.
Next steps
- To learn more about JAR tasks, see JAR task for jobs.
- To learn more about creating a compatible JAR, see Create a Databricks compatible JAR.
- To learn more about creating and running jobs, see Lakeflow Jobs.