OPTIMIZE [db_name.]table_name [WHERE predicate] [ZORDER BY (col_name1, col_name2, ...)]
Optimize the layout of data stored in DBFS. Optionally optimize a subset of data or colocate data by column. If you do not specify colocation, bin-packing optimization is performed.
- Bin-packing optimization is idempotent, meaning that if it is run twice on the same dataset, the second instance has no effect.
- ZOrdering is not idempotent and rearranges all of the data that matches the given filter. Therefore we suggest that you limit it to new data, using partition filters when possible.
- Optimize the subset of rows matching the given partition predicate. Only filters involving partition key attributes are supported.
- Colocate column information in the same set of files. Co-locality is used by Databricks Delta data-skipping algorithms to dramatically reduce the amount of data that needs to be read. You can specify multiple columns for
ZORDER BYas a comma-separated list. However, the effectiveness of the locality drops with each additional column.