Predictive optimization for Delta Lake

Preview

This feature is in Public Preview.

Predictive optimization removes the need to manually manage maintenance operations for Delta tables on Databricks.

With predictive optimization enabled, Databricks automatically identifies tables that would benefit from maintenance operations and runs them for the user. Maintenance operations are only run as necessary, eliminating both unnecessary runs for maintenance operations and the burden associated with tracking and troubleshooting performance.

What operations does predictive optimization run?

Predictive optimization runs the following operations automatically for enabled Delta tables:

Operation

Description

OPTIMIZE

Improves query performance by optimizing file sizes. See Compact data files with optimize on Delta Lake.

VACUUM

Reduces storage costs by deleting data files no longer referenced by the table. See Remove unused data files with vacuum.

Note

OPTIMIZE does not run ZORDER when executed with predictive optimization.

Warning

The retention window for the VACUUM command is determined by the delta.deletedFileRetentionDuration table property, which defaults to 7 days. This means VACUUM removes data files that are no longer referenced by a Delta table version in the last 7 days. If you’d like to retain data for longer (such as to support time travel for longer durations), you must set this table property appropriately before you enable predictive optimization, as in the following example:

ALTER TABLE table_name SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = '30 days');

Where does predictive optimization run?

Predictive optimization identifies tables that would benefit from OPTIMIZE and VACUUM operations and queues them to run using jobs compute. Your account is billed for compute associated with these workloads using a SKU specific to Databricks Managed Services. See pricing for Databricks managed services. Databricks provides system tables for observability into predictive optimization operations, costs, and impact. See Use system tables to track predictive optimization.

Note

Predictive optimization does not run OPTIMIZE commands on tables that use liquid clustering.

Prerequisites for predictive optimization

You must fulfill the following requirements to enable predictive optimization:

  • Your Databricks workspace must be on the Premium plan or above in a region that supports predictive optimization. See Databricks clouds and regions.

  • You must use SQL warehouses or Databricks Runtime 12.2 LTS or above when you enable predictive optimization.

  • Only Unity Catalog managed tables are supported.

Enable predictive optimization

You must enable predictive optimization at the account level.

You must have the following privileges to enable or disable predictive optimization at the specified level:

Unity Catalog object

Privilege

Account

Account admin

Catalog

Catalog owner

Schema

Schema owner

Note

When you enable predictive optimization for the first time, Databricks automatically creates a service principal in your Databricks account. Databricks uses this service principal to perform the requested maintenance operations. See Manage service principals.

Enable predictive optimization for your account

An account admin must complete the following steps to enable predictive optimization for all metastores in an account:

  1. Access the accounts console.

  2. Navigate to Settings, then Feature enablement.

  3. Select Enabled next to Predictive optimization.

Note

Metastores in regions that don’t support predictive optimization aren’t enabled.

Enable or disable predictive optimization for a catalog or schema

Predictive optimization uses an inheritance model. When enabled for a catalog, schemas inherit the property. Tables within an enabled schema inherit predictive optimization. To override this inheritance behavior, you can explicitly disable predictive optimization for a catalog or schema.

Note

You can disable predictive optimization at the catalog or schema level before enabling it at the account level. If predictive optimization is later enabled on the account, it is blocked for tables in these objects.

Use the following syntax to enable or disable predictive optimization:

ALTER CATALOG [catalog_name] {ENABLE | DISABLE} PREDICTIVE OPTIMIZATION;
ALTER {SCHEMA | DATABASE} schema_name {ENABLE | DISABLE} PREDICTIVE OPTIMIZATION;

Check whether predictive optimization is enabled

The Predictive Optimization field is a Unity Catalog property that details if predictive optimization is enabled. If predictive optimization is inherited from a parent object, this is indicated in the field value.

Use the following syntax to see if predictive optimization is enabled:

DESCRIBE (CATALOG | SCHEMA | TABLE) EXTENDED name

Use system tables to track predictive optimization

Databricks provides a system table to track the history of predictive optimization operations. See Predictive optimization system table reference.

Limitations

Predictive optimization is not available in all regions. See Databricks clouds and regions.

Predictive optimization does not run OPTIMIZE commands on tables that use liquid clustering or Z-order.

Predictive optimization does not perform maintenance operations on the following tables: