Predictive optimization system table reference

Preview

This feature is in Public Preview.

Note

To have access to this table, you must enable the storage schema (see Enable system table schemas) and your region must support predictive optimization (see Databricks clouds and regions).

This article outlines the predictive optimization operation history table schema and provides sample queries. Predictive optimization optimizes your data layout for peak performance and cost efficiency. The system table tracks the operation history of this feature. For information on predictive optimization, see Predictive optimization for Delta Lake.

This system table is located at system.storage.predictive_optimization_operations_history.

Delivery considerations

  • Data can take up to 24 hours to be populated.

  • Predictive optimization might run multiple operations on the same cluster. If so, the share of DBUs attributed to each of the multiple operations is approximated. This is why the usage_unit is set to ESTIMATED_DBU. Still, the total number of DBUs spent on the cluster will be accurate.

Predictive optimization table schema

The predictive optimization operation history system table uses the following schema:

Column name

Data type

Description

Example

account_id

string

ID of the account.

11e22ba4-87b9-4cc2-9770-d10b894b7118

workspace_id

string

The ID of the workspace in which predictive optimization ran the operation.

1234567890123456

start_time

timestamp

The time at which the operation started.

2023-01-09 10:00:00.000

end_time

timestamp

The time at which the operation ended.

2023-01-09 11:00:00.000

metastore_name

string

The name of the metastore to which the optimized table belongs.

metastore

catalog_name

string

The name of the catalog to which the optimized table belongs.

catalog

schema_name

string

The name of the schema to which the optimized table belongs.

schema

table_id

string

The ID of the optimized table.

138ebb4b-3757-41bb-9e18-52b38d3d2836

table_name

string

The name of the optimized table.

table1

operation_type

string

The optimization operation which was performed. The value will be COMPACTION or VACUUM.

COMPACTION

operation_id

string

The ID for the optimization operation.

4dad1136-6a8f-418f-8234-6855cfaff18f

operation_status

string

The status of the optimization operation. The value will be SUCCESSFUL or FAILED: INTERNAL_ERROR.

SUCCESSFUL

operation_metrics

map[string, string]

Additional details about the specific optimization that was performed. For COMPACTION operations: (number_of_compacted_files, amount_of_data_compacted_bytes, number_of_output_files, amount_of_output_data_bytes) For VACUUM operations: (number_of_deleted_files, amount_of_data_deleted_bytes)

{"number_of_output_files":"100","number_of_compacted_files":"1000","amount_of_output_data_bytes":"4000","amount_of_data_compacted_bytes":"10000"}

usage_unit

string

The unit of usage that this operation incurred. Can only be one value: ESTIMATED_DBU.

ESTIMATED_DBU

usage_quantity

decimal

The amount of the usage unit that was used by this operation.

2.12

Example queries

The following sections include sample queries you can use to gain insights into the predictive optimization system table. For these queries to work, you need to replace the values within curly brackets {{}} with your own parameters.

This article includes the following example queries:

How many DBUs has predictive optimization used in the last 30 days?

SELECT SUM(usage_quantity)
FROM system.storage.predictive_optimization_operations_history
WHERE
     usage_unit = "ESTIMATED_DBU"
     AND  timestampdiff(day, start_time, Now()) < 30

On which tables did predictive optimization spend the most in the last 30 days?

SELECT
     metastore_name,
     catalog_name,
     schema_name,
     table_name,
     SUM(usage_quantity) as totalDbus
FROM system.storage.predictive_optimization_operations_history
WHERE
    usage_unit = "ESTIMATED_DBU"
    AND timestampdiff(day, start_time, Now()) < 30
GROUP BY ALL
ORDER BY totalDbus DESC

On which tables is predictive optimization performing the most operations?

SELECT
     metastore_name,
     catalog_name,
     schema_name,
     table_name,
     operation_type,
     COUNT(DISTINCT operation_id) as operations
FROM system.storage.predictive_optimization_operations_history
GROUP BY ALL
ORDER BY operations DESC

For a given catalog, how many total bytes have been compacted?

SELECT
     schema_name,
     table_name,
     SUM(operation_metrics["amount_of_data_compacted_bytes"]) as bytesCompacted
FROM system.storage.predictive_optimization_operations_history
WHERE
    metastore_name = {{metastore_name}}
    AND catalog_name = {{catalog_name}}
    AND operation_type = "COMPACTION"
GROUP BY ALL
ORDER BY bytesCompacted DESC

What tables had the most bytes vacuumed?

SELECT
     metastore_name,
     catalog_name,
     schema_name,
     table_name,
     SUM(operation_metrics["amount_of_data_deleted_bytes"]) as bytesVacuumed
FROM system.storage.predictive_optimization_operations_history
WHERE operation_type = "VACUUM"
GROUP BY ALL
ORDER BY bytesVacuumed DESC

What is the success rate for operations run by predictive optimizations?

WITH operation_counts AS (
     SELECT
           COUNT(DISTINCT (CASE WHEN operation_status = "SUCCESSFUL" THEN operation_id END)) as successes,
           COUNT(DISTINCT operation_id) as total_operations
    FROM system.storage.predictive_optimization_operations_history
 )
SELECT successes / total_operations as success_rate
FROM operation_counts