Predictive optimization for Delta Lake
Preview
This feature is in Public Preview.
Predictive optimization removes the need to manually manage maintenance operations for Delta tables on Databricks.
With predictive optimization enabled, Databricks automatically identifies tables that would benefit from maintenance operations and runs them for the user. Maintenance operations are only run as necessary, eliminating both unnecessary runs for maintenance operations and the burden associated with tracking and troubleshooting performance.
What operations does predictive optimization run?
Predictive optimization runs the following operations automatically for enabled Delta tables:
Operation |
Description |
---|---|
|
Improves query performance by optimizing file sizes. See Compact data files with optimize on Delta Lake. |
|
Reduces storage costs by deleting data files no longer referenced by the table. See Remove unused data files with vacuum. |
Note
OPTIMIZE
does not run ZORDER
when executed with predictive optimization.
Warning
The retention window for the VACUUM
command is determined by the delta.deletedFileRetentionDuration
table property, which defaults to 7 days. This means VACUUM
removes data files that are no longer referenced by a Delta table version in the last 7 days. If you’d like to retain data for longer (such as to support time travel for longer durations), you must set this table property appropriately before you enable predictive optimization, as in the following example:
ALTER TABLE table_name SET TBLPROPERTIES ('delta.deletedFileRetentionDuration' = '30 days');
Where does predictive optimization run?
Predictive optimization identifies tables that would benefit from OPTIMIZE
and VACUUM
operations and queues them to run using jobs compute. Your account is billed for compute associated with these workloads using a SKU specific to Databricks Managed Services. See pricing for Databricks managed services. Databricks provides system tables for observability into predictive optimization operations, costs, and impact. See Use system tables to track predictive optimization.
Prerequisites for predictive optimization
You must fulfill the following requirements to enable predictive optimization:
Your Databricks workspace must be on the Premium plan or above in a region that supports predictive optimization. See Databricks clouds and regions.
You must use SQL warehouses or Databricks Runtime 12.2 LTS or above when you enable predictive optimization.
Only Unity Catalog managed tables are supported.
Serverless compute must be enabled in your account. See Enable serverless compute in your account.
Enable predictive optimization
You must have the following privileges to enable or disable predictive optimization at the specified level:
Unity Catalog object |
Privilege |
---|---|
Account |
Account admin |
Catalog |
Catalog owner |
Schema |
Schema owner |
Note
When you enable predictive optimization for the first time, Databricks automatically creates a service principal in your Databricks account. Databricks uses this service principal to perform the requested maintenance operations. See Manage service principals.
Enable predictive optimization for your account
You must enable predictive optimization at the account level. You can then enable or disable predictive optimization at the catalog and schema levels.
An account admin must complete the following steps to enable predictive optimization for all metastores in an account:
Access the accounts console.
Navigate to Settings, then Feature enablement.
Select Enabled next to Predictive optimization.
Note
Metastores in regions that don’t support predictive optimization aren’t enabled.
Enable or disable predictive optimization for a catalog or schema
Predictive optimization uses an inheritance model. When enabled for a catalog, schemas inherit the property. Tables within an enabled schema inherit predictive optimization. To override this inheritance behavior, you can explicitly disable predictive optimization for a catalog or schema.
Use the following syntax to enable or disable predictive optimization:
ALTER CATALOG [catalog_name] {ENABLE | DISABLE} PREDICTIVE OPTIMIZATION;
ALTER {SCHEMA | DATABASE} schema_name {ENABLE | DISABLE} PREDICTIVE OPTIMIZATION;
Check whether predictive optimization is enabled
The Predictive Optimization
field is a Unity Catalog property that details if predictive optimization is enabled. If predictive optimization is inherited from a parent object, this is indicated in the field value.
Use the following syntax to see if predictive optimization is enabled:
DESCRIBE (CATALOG | SCHEMA | TABLE) EXTENDED name
Use system tables to track predictive optimization
Databricks provides a system table to track the history of predictive optimization operations. See Predictive optimization system table reference.
Limitations
Predictive optimization does not perform maintenance operations on the following tables:
Tables loaded to a workspace as Delta Sharing recipients.
Materialized views. See Use materialized views in Databricks SQL.
Streaming tables. See Load data using streaming tables in Databricks SQL.