Tutorial: Run Scala code on serverless compute
Databricks Connect for Scala on serverless compute is in Beta.
This tutorial provides an overview of how to get started with Databricks Connect for Scala using serverless compute. It walks through building a Unity Catalog-enabled compute (either classic compute in standard access mode or serverless compute) compatible Scala JAR file.
To create a Scala project that is fully configured to deploy and run a JAR on serverless compute, you can use Databricks Asset Bundles. See Build a Scala JAR using Databricks Asset Bundles.
Requirements
Your local development environment must meet the requirements for Databricks Connect for Scala. See Databricks Connect usage requirements, which includes the following:
-
Java Development Kit (JDK)
-
sbt
-
Databricks CLI, configured for serverless compute:
databricks auth login --configure-serverless --host <workspace-url>
Step 1: Create a Scala project
First create a Scala project. When prompted, enter a project name, for example, my-spark-app.
sbt new scala/scala-seed.g8
Step 2: Update the Scala and JDK versions
Before building your JAR, ensure the version of the Java Development Kit (JDK) and Scala that you use to compile your code are supported for serverless compute. For details on this requirement, see JDK and Scala versions.
For compatible versions, see the version support matrix.
The following configuration is for Scala 2.13 and JDK 17, which is compatible with dedicated or standard access compute with Databricks Runtime version 17 and serverless environment version 4.
scalaVersion := "2.13.16"
javacOptions ++= Seq("-source", "17", "-target", "17")
scalacOptions ++= Seq("-release", "17")
Step 3: Add Databricks Connect as a dependency
Add Databricks Connect as a dependency to build Scala JARs. For more information, see Dependencies.
In your Scala project's build.sbt build file, add the following reference to Databricks Connect.
scalaVersion := "2.13.16"
libraryDependencies += "com.databricks" %% "databricks-connect" % "17.0.+"
// To run with new JVM options, a fork is required, otherwise it uses the same options as the sbt process.
fork := true
javaOptions += "--add-opens=java.base/java.nio=ALL-UNNAMED"
Step 4: Add other dependencies
Databricks recommends packaging your application and all dependent libraries into a single JAR file, also known as an über or fat JAR. Alternatively, you can install dependent libraries as compute-scoped libraries or in your serverless environment. For more information, see Package as a single JAR.
Remove any dependency on Spark. Spark APIs are provided by Databricks Connect. For more information, see Spark dependencies.
Step 5: Add Spark code
Create your main class in src/main/scala/example/DatabricksExample.scala. For details about using Spark session in your Scala code, see Using the Spark session in your code.
package com.examples
import com.databricks.connect.DatabricksSession
import org.apache.spark.sql.{SparkSession}
object SparkJar {
def main(args: Array[String]): Unit = {
val spark: SparkSession = DatabricksSession.builder()
.validateSession(false)
.addCompiledArtifacts(SparkJar.getClass.getProtectionDomain.getCodeSource.getLocation.toURI)
.getOrCreate()
println(spark.version)
println(spark.range(10).limit(3).collect().mkString(" "))
}
}
Step 6: Run and build your code
Next, run your code:
sbt run
Now create a project/assembly.sbt file with the following line, then build the project:
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.3.1")
sbt assembly
Step 7: Deploy your JAR
Now deploy your JAR file using a JAR task from the UI or using Databricks Asset Bundles:
The JAR you created is also supported on standard compute. However, for standard compute an administrator must add Maven coordinates and paths for JAR libraries to an allowlist. See Allowlist libraries and init scripts on compute with standard access mode (formerly shared access mode).
Databricks recommends adding a whole volume instead of individual JARs to the allowlist.