Skip to main content

Build a Scala JAR using Databricks Asset Bundles

This article describes how to build, deploy, and run a Scala JAR with Databricks Asset Bundles. For information about bundles, see What are Databricks Asset Bundles?.

For example configuration that builds a Java JAR and uploads it to Unity Catalog, see Bundle that uploads a JAR file to Unity Catalog.

Requirements

This tutorial requires that your Databricks workspace meets the following requirements:

In addition, your local development environment must have the following installed:

Step 1: Create the bundle

First, create the bundle using the bundle init command and the Scala project bundle template.

The Scala JAR bundle template creates a bundle that builds a JAR, uploads it to the specified volume, and defines a job with a Spark task with the JAR that runs on serverless compute. The Scala in the template project defines a UDF that applies a simple transformation to a sample DataFrame and outputs the results. The source for the template is in the bundle-examples repository.

  1. Run the following command in a terminal window on your local development machine. It prompts for the value of some required fields.

    Bash
    databricks bundle init --template-dir contrib/templates/scala-job https://github.com/databricks/bundle-examples
  2. For a name for the project, enter my_scala_project. This determines the name of the root directory for this bundle. This root directory is created within your current working directory.

  3. For volumes destination path, provide the Unity Catalog volumes path in Databricks where you want the bundle directory to be created that will contain the JAR and other artifacts, for example /Volumes/my-catalog/my-schema/bundle-volumes.

    note

    The template project configures serverless compute, but if you change it to use classic compute, your admin may need to allowlist the Volumes JAR path you specify. See Allowlist libraries and init scripts on compute with standard access mode (formerly shared access mode).

Step 2: Configure VM options

  1. Import the current directory in your IntelliJ where build.sbt is located.

  2. Choose Java 17 in IntelliJ. Go to File > Project Structure > SDKs.

  3. Open src/main/scala/com/examples/Main.scala.

  4. Navigate to the configuration for Main to add VM options:

    Edit Main

    Add VM options

  5. Add the following to your VM options:

    --add-opens=java.base/java.nio=ALL-UNNAMED
tip

Alternatively, or if you are using Visual Studio Code, add the following to your sbt build file:

fork := true
javaOptions += "--add-opens=java.base/java.nio=ALL-UNNAMED"

Then run your application from the terminal:

sbt run

Step 3: Explore the bundle

To view the files that the template generated, switch to the root directory of your newly created bundle and open this directory in your IDE. The template uses sbt to compile and package Scala files and works with Databricks Connect for local development. For detailed information, see the generated project README.md.

Files of particular interest include the following:

  • databricks.yml: This file specifies the bundle's programmatic name, includes a reference to the job definition, and specifies settings about the target workspace.
  • resources/my_scala_project.job.yml: This file specifies the job's JAR task and cluster settings.
  • src/: This directory includes the source files for the Scala project.
  • build.sbt: This file contains important build and dependent library settings.
  • README.md: This file contains these getting started steps, and local build instructions and settings.

Step 4: Validate the project's bundle configuration file

Next, check whether the bundle configuration is valid using the bundle validate command.

  1. From the root directory, run the Databricks CLI bundle validate command. Among other checks, this verifies that the volume specified in the configuration file exists in the workspace.

    Bash
    databricks bundle validate
  2. If a summary of the bundle configuration is returned, then the validation succeeded. If any errors are returned, fix the errors, then repeat this step.

If you make any changes to your bundle after this step, repeat this step to check whether your bundle configuration is still valid.

Step 5: Deploy the local project to the remote workspace

Now deploy the bundle to your remote Databricks workspace using the bundle deploy command. This step builds the JAR file and uploads it to the specified volume.

  1. Run the Databricks CLI bundle deploy command:

    Bash
    databricks bundle deploy -t dev
  2. To check whether the locally built JAR file was deployed:

    1. In your Databricks workspace's sidebar, click Catalog Explorer.
    2. Navigate to the volume destination path you specified when you initialized the bundle. The JAR file should be located in the following folder inside that path: /my_scala_project/dev/<user-name>/.internal/.
  3. To check whether the job was created:

    1. In your Databricks workspace's sidebar, click Jobs & Pipelines.
    2. Optionally, select the Jobs and Owned by me filters.
    3. Click [dev <your-username>] my_scala_project.
    4. Click the Tasks tab.

    There should be one task: main_task.

If you make any changes to your bundle after this step, repeat the validation and deployment steps.

Step 6: Run the deployed project

Finally, run the Databricks job using the bundle run command.

  1. From the root directory, run the Databricks CLI bundle run command, specifying the name of the job in the definition file my_scala_project.job.yml:

    Bash
    databricks bundle run -t dev my_scala_project
  2. Copy the value of Run URL that appears in your terminal and paste this value into your web browser to open your Databricks workspace.

  3. In your Databricks workspace, after the task completes successfully and shows a green title bar, click the main_task task to see the results.