Skip to main content

Tutorial: Run Scala code on serverless compute

Beta

Databricks Connect for Scala on serverless compute is in Beta.

This tutorial provides an overview of how to get started with Databricks Connect for Scala using serverless compute. It walks through building a Unity Catalog-enabled compute (either classic compute in standard access mode or serverless compute) compatible Scala JAR file.

tip

To create a Scala project that is fully configured to deploy and run a JAR on serverless compute, you can use Databricks Asset Bundles. See Build a Scala JAR using Databricks Asset Bundles.

Requirements

Your local development environment must meet the requirements for Databricks Connect for Scala. See Databricks Connect usage requirements, which includes the following:

  • Java Development Kit (JDK)

  • sbt

  • Databricks CLI, configured for serverless compute:

    databricks auth login --configure-serverless --host <workspace-url>

Step 1: Create a Scala project

First create a Scala project. When prompted, enter a project name, for example, my-spark-app.

Bash
sbt new scala/scala-seed.g8

Step 2: Update the Scala and JDK versions

Before building your JAR, ensure the version of the Java Development Kit (JDK) and Scala that you use to compile your code are supported for serverless compute. For details on this requirement, see JDK and Scala versions.

For compatible versions, see the version support matrix.

The following configuration is for Scala 2.13 and JDK 17, which is compatible with dedicated or standard access compute with Databricks Runtime version 17 and serverless environment version 4.

scalaVersion := "2.13.16"

javacOptions ++= Seq("-source", "17", "-target", "17")
scalacOptions ++= Seq("-release", "17")

Step 3: Add Databricks Connect as a dependency

Add Databricks Connect as a dependency to build Scala JARs. For more information, see Dependencies.

In your Scala project's build.sbt build file, add the following reference to Databricks Connect.

scalaVersion := "2.13.16"
libraryDependencies += "com.databricks" %% "databricks-connect" % "17.0.+"

// To run with new JVM options, a fork is required, otherwise it uses the same options as the sbt process.
fork := true
javaOptions += "--add-opens=java.base/java.nio=ALL-UNNAMED"

Step 4: Add other dependencies

Databricks recommends packaging your application and all dependent libraries into a single JAR file, also known as an über or fat JAR. Alternatively, you can install dependent libraries as compute-scoped libraries or in your serverless environment. For more information, see Package as a single JAR.

important

Remove any dependency on Spark. Spark APIs are provided by Databricks Connect. For more information, see Spark dependencies.

Step 5: Add Spark code

Create your main class in src/main/scala/example/DatabricksExample.scala. For details about using Spark session in your Scala code, see Using the Spark session in your code.

Scala
package com.examples

import com.databricks.connect.DatabricksSession
import org.apache.spark.sql.{SparkSession}

object SparkJar {
def main(args: Array[String]): Unit = {
val spark: SparkSession = DatabricksSession.builder()
.validateSession(false)
.addCompiledArtifacts(SparkJar.getClass.getProtectionDomain.getCodeSource.getLocation.toURI)
.getOrCreate()

println(spark.version)
println(spark.range(10).limit(3).collect().mkString(" "))
}
}

Step 6: Run and build your code

Next, run your code:

Bash
sbt run

Now create a project/assembly.sbt file with the following line, then build the project:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "2.3.1")
Bash
sbt assembly

Step 7: Deploy your JAR

Now deploy your JAR file using a JAR task from the UI or using Databricks Asset Bundles:

note

The JAR you created is also supported on standard compute. However, for standard compute an administrator must add Maven coordinates and paths for JAR libraries to an allowlist. See Allowlist libraries and init scripts on compute with standard access mode (formerly shared access mode).

Databricks recommends adding a whole volume instead of individual JARs to the allowlist.