Run JAR workloads in Databricks clean rooms
This feature is in Public Preview.
This page describes how to run custom JAR workloads in a clean room. With JAR support, you can bring compiled Java or Scala code into a clean room and run it over the data that collaborators share, without any collaborator gaining direct access to another collaborator's data or code.
You can use a JAR in a clean room in two ways:
- JAR analyses: Run a standalone JAR workload as a clean room analysis, much like a notebook. You specify a main class as the entry point, and all collaborators approve the analysis before it runs.
- JAR user-defined functions (UDFs): Register a custom Java or Scala function from a JAR and call it from a clean room notebook using
CREATE TEMPORARY FUNCTION.
The following table compares the two approaches:
Capability | JAR analysis | JAR UDF |
|---|---|---|
What it runs | A standalone JAR workload, registered as its own analysis | A function from a JAR, called from a clean room notebook |
Entry point | A main class that you specify | A handler method that you reference in |
Preview to enable | Clean Room JAR Task | Clean Room Jar UDF |
Serverless environment |
| Environment version |
How it runs | Approved like a notebook, then run on its own | Runs as part of the notebook that calls it |
Before you begin
-
You must create a new clean room after all collaborators enable the preview. The preview does not apply to clean rooms that were created before it was enabled, and disabling the preview later does not affect clean rooms that already use the feature.
-
A newly created clean room can take up to 24 hours before it can run JAR workloads.
-
The JAR must be compiled with Java 17 or higher, or Scala 2.13 or higher.
-
Serverless compute is required. JAR analyses run on the
4-scala-previewserverless environment. JAR UDFs require serverless environment version4.
Build and test your JAR
-
You must create a new clean room after all collaborators enable the preview. The preview does not apply to clean rooms that were created before it was enabled, and disabling the preview later does not affect clean rooms that already use the feature.
-
For JAR analyses, see JAR task for jobs.
-
For JAR UDFs, see User-defined scalar functions - Scala.
When the JAR works as expected, upload it to a Unity Catalog volume. You can create a volume that holds only the JARs you want to share. The entire volume is shared with the clean room, so include only the files you intend to share.
Run a JAR analysis
To run a JAR analysis in a clean room, do the following:
-
Upload your JAR to a Unity Catalog volume.
-
Add the volume to the clean room as a data asset. See Step 3. Add data assets and notebooks to the clean room.
-
In your Databricks workspace, click
Catalog.
-
Click the Clean Rooms > button.
-
Select the clean room from the list.
-
Click Add Analysis.
-
On the Add Analysis page, select JAR Analysis.
-
Enter the details for the JAR analysis, including the main class to use as the entry point and the path to the JAR in the shared volume.
-
Click Add JAR to add the analysis to the clean room.
-
Make sure all collaborators have approved the JAR analysis.
JAR analyses are approved the same way as clean room notebooks: every collaborator must review the analysis and add an approving review. You can also configure auto-approval rules. See Approve a notebook in a clean room and Auto-approval rules.
-
Select the JAR analysis and click Run.
-
Specify the parameters to pass to the main method.
Clean Rooms automatically injects additional context parameters in
--<key> <value>format, similar to the parameters that are passed to clean room notebooks. See Notebook parameters.
Use a JAR UDF in a notebook
To use a JAR UDF in a clean room notebook, do the following:
-
Upload your JAR to a Unity Catalog volume.
-
Add the volume to the clean room as a data asset. See Step 3. Add data assets and notebooks to the clean room.
-
In a clean room notebook, use
CREATE TEMPORARY FUNCTIONto register a function that is backed by the shared JAR, then call it like any other function.Use
LANGUAGE JAVAfor a Java JAR orLANGUAGE SCALAfor a Scala JAR.ENVIRONMENT_VERSIONmust be4. In theDEPENDENCIESpath, use the clean room catalog alias that was assigned when the clean room was created, not the original catalog name.The following example registers an email-normalization function from a JAR:
SQLCREATE TEMPORARY FUNCTION normalize_email(email STRING)
RETURNS STRING
LANGUAGE JAVA
ENVIRONMENT (
DEPENDENCIES = '["/Volumes/<catalog-alias>/<schema>/<volume>/<path>/<file-name>.jar"]',
ENVIRONMENT_VERSION = 4
)
HANDLER "com.mycompany.utils.EmailNormalizer.normalize"Then call the function in a query:
SQLSELECT email_address, normalize_email(email_address) AS normalized_email
FROM my_table -
Add the notebook to the clean room, have collaborators approve it, and run it. See Run notebooks in clean rooms.
Limitations
- JAR analysis runs do not emit events in the
clean_room_eventssystem table. See Clean room events system table reference.