How does Databricks enforce user isolation?
This page explains how Databricks uses Lakeguard to enforce user isolation in shared compute environments and fine-grained access control in dedicated compute.
What is Lakeguard?
Lakeguard is a background service on Databricks that enforces code isolation and data filtering so multiple users can share the same compute resource securely and cost‑effectively, even when fine‑grained access controls are in place.
How does Lakeguard work?
In shared compute environments such as standard classic compute, serverless compute, and SQL warehouses, Lakeguard isolates user code from the Spark engine and from other users. This design enables many users to share the same compute resources while keeping strict boundaries between users, the Spark driver, and executors.
Classic Spark architecture
The following image shows how in the traditional Spark architecture, user applications share a JVM with privileged access to the underlying machine.
Lakeguard architecture
Lakeguard isolates all user code using secure containers. This allows multiple workloads to run on the same compute resource while maintaining strict isolation between users.
Spark client isolation
Lakeguard isolates client applications from the Spark driver and from each other using two key components:
-
Spark Connect: Lakeguard uses Spark Connect (introduced with Apache Spark 3.4) to decouple client applications from the driver. Client applications and drivers no longer share the same JVM or classpath. This separation prevents unauthorized data access. This design also prevents users from accessing data resulting from over‑fetching when queries include row‑ or column‑level filters.
-
Container sandboxing: Each client application runs in its own isolated container environment. This prevents user code from accessing other users' data or the underlying machine. The sandboxing uses container-based isolation techniques to create secure boundaries between users.
UDF isolation
By default, Spark executors do not isolate UDFs. That lack of isolation can allow UDFs to write files or access the underlying machine.
Lakeguard isolates user-defined code, including UDFs, on Spark executors by:
- Sandboxing the execution environment on Spark executors.
- Isolating egress network traffic from UDFs to prevent unauthorized external access.
- Replicating the client environment into the UDF sandbox so users can access required libraries.
This isolation applies to UDFs on standard compute and to Python UDFs on serverless compute and SQL warehouses.