Lakehouse Replay
This feature is in Beta. Workspace admins can control the enablement of this feature from the workspace Previews page. See Manage Databricks previews.
Lakehouse Replay improves the quality and stability of future Databricks Runtime releases by automatically replaying a small subset of read-only workloads from your workspace against upcoming runtime versions before they reach production. When a workload succeeds in production but fails on the upcoming runtime version, Databricks identifies and fixes the regression before that version ships. This makes runtime upgrades safer, with no setup, configuration, or maintenance required.
Lakehouse Replay samples workloads only from serverless compute, but the regressions it catches improve every Databricks Runtime release, including both classic and serverless. Running on serverless lets Databricks perform this testing on managed compute, so the replay work is not billed to you.
How Lakehouse Replay works
Lakehouse Replay uses shadow execution to test upcoming runtime versions:
- Your workloads run on production as usual.
- Lakehouse Replay selects a small subset of safe, read-only workloads for testing.
- Lakehouse Replay reruns the Spark plans from the selected workloads on Databricks-managed shadow compute running an upcoming runtime version.
- If a workload succeeds in production but fails on the shadow compute, Databricks investigates and resolves the regression before releasing the runtime version.
Shadow compute runs entirely within your Databricks workspace and has no impact on your production workloads or jobs.
Supported workloads
Lakehouse Replay only replays workloads that meet strict safety requirements:
- Read-only SQL and DataFrame workloads from serverless SQL warehouses and serverless notebooks and jobs.
- Workloads that read only Unity Catalog Delta tables.
For DataFrame workloads, Lakehouse Replay replays only the Spark plan submitted to the production cluster. Preceding Python cells are not executed.
The following workloads are excluded:
- Write operations
- User-defined functions (UDFs)
- AI functions such as
ai_query - Federated workloads that access external databases
- Workloads with attribute-based or role-based access control (ABAC/RBAC)
Data security and privacy
Lakehouse Replay does not change your existing data security and privacy posture:
- No data export: Lakehouse Replay compares only execution status and runtime metrics to detect discrepancies. It does not read, export, or store query results.
- Run with the same permissions: Replayed workloads run with the same user identity as the original production query and respect Unity Catalog permissions at the time of replay.
- Isolated execution: The Databricks shadow compute used for replay is isolated from your production compute and cannot access external APIs, databases, or other workspaces.
Billing
Lakehouse Replay uses Databricks-managed serverless compute to run the replay, and customers are not billed for related compute costs. Replayed workloads may incur minimal object storage API costs, as a replayed workload reads data using the same permissions and storage path as the original workload.
Audit logs
Lakehouse Replay activity is recorded in the audit log system table under the lakehouseReplay service. See Lakehouse Replay events.
Frequently asked questions
- Do I need to do anything to use Lakehouse Replay?
- Does Lakehouse Replay affect my production workloads?
- How do I know if my workloads are being replayed?
- How frequently are workloads replayed?
- What if a replayed workload fails?
- What types of regressions does Lakehouse Replay detect?
Do I need to do anything to use Lakehouse Replay?
No. If enabled in your workspace, Lakehouse Replay runs automatically with no setup, configuration, or maintenance.
Does Lakehouse Replay affect my production workloads?
No. Shadow compute runs separately from your production compute and does not affect running workloads, job schedules, or query performance.
How do I know if my workloads are being replayed?
Replayed workloads do not appear in your job run history or query history. Lakehouse Replay activity is available in the audit log system table.
How frequently are workloads replayed?
Sampling frequency is probabilistic and based on workspace traffic and workload type. Most workloads are replayed within one hour of the original execution.
What if a replayed workload fails?
If a workload fails on the shadow compute but succeeds on production, Databricks investigates. If Databricks confirms the failure as a regression, Databricks resolves the issue before releasing the runtime version. Databricks does not notify you of individual failures unless it needs additional context.
What types of regressions does Lakehouse Replay detect?
Lakehouse Replay detects execution failures. These are workloads that succeed in production but fail on the upcoming runtime version.