How does Databricks enforce user isolation?

This page explains how Databricks uses Lakeguard to enforce user isolation in shared compute environments and fine-grained access control in dedicated compute.

What is Lakeguard?

Lakeguard is a set of technologies on Databricks that enforce code isolation and data filtering so multiple users can share the same compute resource securely and cost‑effectively, and access data with fine‑grained access controls in place on compute offering privileged machine access.

How does Lakeguard work?

In shared compute environments such as standard classic compute, serverless compute, and SQL warehouses, Lakeguard isolates user code from the Spark engine and from other users. This design enables many users to share the same compute resources while keeping strict boundaries between users, the Spark driver, and executors.

Classic Spark architecture

The following image shows how in the traditional Spark architecture, user applications share a JVM with privileged access to the underlying machine.

Traditional Spark architecture

Lakeguard architecture

Lakeguard isolates all user code using secure containers. This allows multiple workloads to run on the same compute resource while maintaining strict isolation between users.

Lakeguard architecture

Spark client isolation

Lakeguard isolates client applications from the Spark driver and from each other using two key components:

Spark Connect: Lakeguard uses Spark Connect (introduced with Apache Spark 3.4) to decouple client applications from the driver. Client applications and drivers no longer share the same JVM or classpath. This separation prevents unauthorized data access. This design also prevents users from accessing data resulting from over‑fetching when queries include row‑ or column‑level filters.

note
Spark Connect defers analysis and name resolution to execution time, which may change the behavior of your code. See Compare Spark Connect to Spark Classic.
Container sandboxing: Each client application runs in its own isolated container environment. This prevents user code from accessing other users' data or the underlying machine. The sandboxing uses container-based isolation techniques to create secure boundaries between users.

UDF isolation

By default, Spark executors do not isolate UDFs. That lack of isolation can allow UDFs to write files or access the underlying machine.

Lakeguard isolates user-defined code, including UDFs, on Spark executors by:

Sandboxing the execution environment on Spark executors.
Isolating egress network traffic from UDFs to prevent unauthorized external access.
Replicating the client environment into the UDF sandbox so users can access required libraries.

This isolation applies to UDFs on standard compute and to Python UDFs on serverless compute and SQL warehouses.

What is Lakeguard?​

How does Lakeguard work?​

Classic Spark architecture​

Lakeguard architecture​

Spark client isolation​

UDF isolation​

What is Lakeguard?

How does Lakeguard work?

Classic Spark architecture

Lakeguard architecture

Spark client isolation

UDF isolation