Skip to main content

Run JAR workloads in Databricks clean rooms

Preview

This feature is in Public Preview.

This page describes how to run custom JAR workloads in a clean room. With JAR support, you can bring compiled Java or Scala code into a clean room and run it over the data that collaborators share, without any collaborator gaining direct access to another collaborator's data or code.

You can use a JAR in a clean room in two ways:

  • JAR analyses: Run a standalone JAR workload as a clean room analysis, much like a notebook. You specify a main class as the entry point, and all collaborators approve the analysis before it runs.
  • JAR user-defined functions (UDFs): Register a custom Java or Scala function from a JAR and call it from a clean room notebook using CREATE TEMPORARY FUNCTION.

The following table compares the two approaches:

Capability

JAR analysis

JAR UDF

What it runs

A standalone JAR workload, registered as its own analysis

A function from a JAR, called from a clean room notebook

Entry point

A main class that you specify

A handler method that you reference in CREATE TEMPORARY FUNCTION

Preview to enable

Clean Room JAR Task

Clean Room Jar UDF

Serverless environment

4-scala-preview

Environment version 4

How it runs

Approved like a notebook, then run on its own

Runs as part of the notebook that calls it

Before you begin

  • You must create a new clean room after all collaborators enable the preview. The preview does not apply to clean rooms that were created before it was enabled, and disabling the preview later does not affect clean rooms that already use the feature.

  • A newly created clean room can take up to 24 hours before it can run JAR workloads.

  • The JAR must be compiled with Java 17 or higher, or Scala 2.13 or higher.

  • Serverless compute is required. JAR analyses run on the 4-scala-preview serverless environment. JAR UDFs require serverless environment version 4.

Build and test your JAR

  • You must create a new clean room after all collaborators enable the preview. The preview does not apply to clean rooms that were created before it was enabled, and disabling the preview later does not affect clean rooms that already use the feature.

  • For JAR analyses, see JAR task for jobs.

  • For JAR UDFs, see User-defined scalar functions - Scala.

When the JAR works as expected, upload it to a Unity Catalog volume. You can create a volume that holds only the JARs you want to share. The entire volume is shared with the clean room, so include only the files you intend to share.

Run a JAR analysis

To run a JAR analysis in a clean room, do the following:

  1. Upload your JAR to a Unity Catalog volume.

  2. Add the volume to the clean room as a data asset. See Step 3. Add data assets and notebooks to the clean room.

  3. In your Databricks workspace, click Data icon. Catalog.

  4. Click the Clean Rooms > button.

  5. Select the clean room from the list.

  6. Click Add Analysis.

  7. On the Add Analysis page, select JAR Analysis.

  8. Enter the details for the JAR analysis, including the main class to use as the entry point and the path to the JAR in the shared volume.

  9. Click Add JAR to add the analysis to the clean room.

  10. Make sure all collaborators have approved the JAR analysis.

    JAR analyses are approved the same way as clean room notebooks: every collaborator must review the analysis and add an approving review. You can also configure auto-approval rules. See Approve a notebook in a clean room and Auto-approval rules.

  11. Select the JAR analysis and click Run.

  12. Specify the parameters to pass to the main method.

    Clean Rooms automatically injects additional context parameters in --<key> <value> format, similar to the parameters that are passed to clean room notebooks. See Notebook parameters.

Use a JAR UDF in a notebook

To use a JAR UDF in a clean room notebook, do the following:

  1. Upload your JAR to a Unity Catalog volume.

  2. Add the volume to the clean room as a data asset. See Step 3. Add data assets and notebooks to the clean room.

  3. In a clean room notebook, use CREATE TEMPORARY FUNCTION to register a function that is backed by the shared JAR, then call it like any other function.

    Use LANGUAGE JAVA for a Java JAR or LANGUAGE SCALA for a Scala JAR. ENVIRONMENT_VERSION must be 4. In the DEPENDENCIES path, use the clean room catalog alias that was assigned when the clean room was created, not the original catalog name.

    The following example registers an email-normalization function from a JAR:

    SQL
    CREATE TEMPORARY FUNCTION normalize_email(email STRING)
    RETURNS STRING
    LANGUAGE JAVA
    ENVIRONMENT (
    DEPENDENCIES = '["/Volumes/<catalog-alias>/<schema>/<volume>/<path>/<file-name>.jar"]',
    ENVIRONMENT_VERSION = 4
    )
    HANDLER "com.mycompany.utils.EmailNormalizer.normalize"

    Then call the function in a query:

    SQL
    SELECT email_address, normalize_email(email_address) AS normalized_email
    FROM my_table
  4. Add the notebook to the clean room, have collaborators approve it, and run it. See Run notebooks in clean rooms.

Limitations