Skip to main content

Build a Scala JAR using Databricks Asset Bundles

This article describes how to build, deploy, and run a Scala JAR with Databricks Asset Bundles. For information about bundles, see What are Databricks Asset Bundles?.

For example configuration that builds a Java JAR and uploads it to Unity Catalog, see Bundle that uploads a JAR file to Unity Catalog.

Requirements

  • Databricks CLI version 0.218.0 or above, and authentication is configured. To check your installed version of the Databricks CLI, run the command databricks -v. To install the Databricks CLI, see Install or update the Databricks CLI. To configure authentication, see Configure access to your workspace.
  • You must have a Unity Catalog volume in Databricks where you want to store the build artifacts, and permissions to upload the JAR to a specified volume path. See Create and manage volumes.

Step 1: Create the bundle

First, create the bundle using the bundle init command and the Scala project bundle template. The Scala JAR bundle template creates a bundle that builds a JAR, uploads it to the specified volume, and defines a job with a Spark task with the JAR that runs on a specified cluster. The Scala in the template project defines a UDF that applies a simple transformation to a sample DataFrame and outputs the results. The source for the template is in the bundle-examples repository.

  1. Run the following command in a terminal window on your local development machine. It prompts for the value of some required fields.

    Bash
    databricks bundle init --template-dir contrib/templates/scala-job https://github.com/databricks/bundle-examples
  2. For a name for the project, enter my_scala_project. This determines the name of the root directory for this bundle. This root directory is created within your current working directory.

  3. For volumes destination path, provide the Unity Catalog volumes path in Databricks where you want the bundle directory to be created that will contain the JAR and other artifacts, for example /Volumes/my-catalog/my-schema/bundle-volumes.

    note

    Depending on your workspace permissions, your admin may need to allowlist the Volumes JAR path you specify. See Allowlist libraries and init scripts on compute with standard access mode (formerly shared access mode).

Step 2: Explore the bundle

To view the files that the template generated, switch to the root directory of your newly created bundle and open this directory with your preferred IDE. Files of particular interest include the following:

  • databricks.yml: This file specifies the bundle's programmatic name, includes a reference to the job definition, and specifies settings about the target workspace.
  • resources/my_scala_project.job.yml: This file specifies the job's JAR task and cluster settings.
  • src/: This directory includes the source files for the Scala project.
  • build.sbt: This file contains important build and dependent library settings.
  • README.md: This file contains these getting started steps, and local build instructions and settings.

Step 3: Validate the project's bundle configuration file

Next, check whether the bundle configuration is valid using the bundle validate command.

  1. From the root directory, run the Databricks CLI bundle validate command. Among other checks, this verifies that the volume specified in the configuration file exists in the workspace.

    Bash
    databricks bundle validate
  2. If a summary of the bundle configuration is returned, then the validation succeeded. If any errors are returned, fix the errors, then repeat this step.

If you make any changes to your bundle after this step, repeat this step to check whether your bundle configuration is still valid.

Step 4: Deploy the local project to the remote workspace

Now deploy the bundle to your remote Databricks workspace using the bundle deploy command. This step builds the JAR file and uploads it to the specified volume.

  1. Run the Databricks CLI bundle deploy command:

    Bash
    databricks bundle deploy -t dev
  2. To check whether the locally built JAR file was deployed:

    1. In your Databricks workspace's sidebar, click Catalog Explorer.
    2. Navigate to the volume destination path you specified when you initialized the bundle. The JAR file should be located in the following folder inside that path: /my_scala_project/dev/<user-name>/.internal/.
  3. To check whether the job was created:

    1. In your Databricks workspace's sidebar, click Workflows.
    2. On the Jobs tab, click [dev <your-username>] my_scala_project.
    3. Click the Tasks tab.

    There should be one task: main_task.

If you make any changes to your bundle after this step, repeat the validation and deployment steps.

Step 5: Run the deployed project

Finally, run the Databricks job using the bundle run command.

  1. From the root directory, run the Databricks CLI bundle run command, specifying the name of the job in the definition file my_scala_project.job.yml:

    Bash
    databricks bundle run -t dev my_scala_project
  2. Copy the value of Run URL that appears in your terminal and paste this value into your web browser to open your Databricks workspace.

  3. In your Databricks workspace, after the task completes successfully and shows a green title bar, click the main_task task to see the results.