Use a JAR in a Databricks job
The Java archive or JAR file format is based on the popular ZIP file format and is used for aggregating many Java or Scala files into one. Using the JAR task, you can ensure fast and reliable installation of Java or Scala code in your Databricks jobs. This article provides an example of creating a JAR and a job that runs the application packaged in the JAR. In this example, you will:
Create the JAR project defining an example application.
Bundle the example files into a JAR.
Create a job to run the JAR.
Run the job and view the results.
Before you begin
You need the following to complete this example:
For Java JARs, the Java Development Kit (JDK).
For Scala JARs, the JDK and sbt.
Step 1: Create a local directory for the example
Create a local directory to hold the example code and generated artifacts, for example, databricks_jar_test
.
Step 2: Create the JAR
Complete the following instructions to use Java or Scala to create the JAR.
Create a Java JAR
From the
databricks_jar_test
folder, create a file namedPrintArgs.java
with the following contents:import java.util.Arrays; public class PrintArgs { public static void main(String[] args) { System.out.println(Arrays.toString(args)); } }
Compile the
PrintArgs.java
file, which creates the filePrintArgs.class
:javac PrintArgs.java
(Optional) Run the compiled program:
java PrintArgs Hello World! # [Hello, World!]
In the same folder as the
PrintArgs.java
andPrintArgs.class
files, create a folder namedMETA-INF
.In the
META-INF
folder, create a file namedMANIFEST.MF
with the following contents. Be sure to add a newline at the end of this file:Main-Class: PrintArgs
From the root of the
databricks_jar_test
folder, create a JAR namedPrintArgs.jar
:jar cvfm PrintArgs.jar META-INF/MANIFEST.MF *.class
(Optional) From the root of the
databricks_jar_test
folder, run the JAR:java -jar PrintArgs.jar Hello World! # [Hello, World!]
Note
If you get the error
no main manifest attribute, in PrintArgs.jar
, be sure to add a newline to the end of theMANIFEST.MF
file, and then try creating and running the JAR again.
Create a Scala JAR
From the
databricks_jar_test
folder, create an empty file namedbuild.sbt
with the following contents:ThisBuild / scalaVersion := "2.12.14" ThisBuild / organization := "com.example" lazy val PrintArgs = (project in file(".")) .settings( name := "PrintArgs" )
From the
databricks_jar_test
folder, create the folder structuresrc/main/scala/example
.In the
example
folder, create a file namedPrintArgs.scala
with the following contents:package example object PrintArgs { def main(args: Array[String]): Unit = { println(args.mkString(", ")) } }
Compile the program:
sbt compile
(Optional) Run the compiled program:
sbt "run Hello World\!" # Hello, World!
In the
databricks_jar_test/project
folder, create a file namedassembly.sbt
with the following contents:addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.0.0")
From the root of the
databricks_jar_test/project
folder, create a JAR namedPrintArgs-assembly-0.1.0-SNAPSHOT.jar
in thetarget/scala-2.12
folder:sbt assembly
(Optional) From the root of the
databricks_jar_test
folder, run the JAR:java -jar target/scala-2.12/PrintArgs-assembly-0.1.0-SNAPSHOT.jar Hello World! # Hello, World!
Step 3. Create a Databricks job to run the JAR
Go to your Databricks landing page and do one of the following:
In the sidebar, click
Workflows and click
.
In the sidebar, click
New and select Job from the menu.
In the task dialog box that appears on the Tasks tab, replace Add a name for your job… with your job name, for example
JAR example
.For Task name, enter a name for the task, for example
java_jar_task
for Java, orscala_jar_task
for Scala.For Type, select JAR.
For Main class, for this example, enter
PrintArgs
for Java, orexample.PrintArgs
for Scala.For Dependent libraries, click + Add.
In the Add dependent library dialog, with Upload and JAR selected, drag your JAR (for this example,
PrintArgs.jar
for Java, orPrintArgs-assembly-0.1.0-SNAPSHOT.jar
for Scala) into the dialog’s Drop JAR here area.Click Add.
For Parameters, for this example, enter
["Hello", "World!"]
.Click Add.
Step 4: Run the job and view the job run details
Click to run the workflow. To view details for the run, click View run in the Triggered run pop-up or click the link in the Start time column for the run in the job runs view.
When the run completes, the output displays in the Output panel, including the arguments passed to the task.
Next steps
To learn more about creating and running Databricks jobs, see Create, run, and manage Databricks Jobs.