Vacuum
Important
This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See VACUUM.
Clean up files associated with a table.
This command works differently depending on whether you’re working on a Delta or Apache Spark table.
Vacuum a Delta table (Delta Lake on Databricks)
VACUUM [ [db_name.]table_name | path] [RETAIN num HOURS] [DRY RUN]
Recursively vacuum directories associated with the Delta table and remove data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. Files are deleted according to the time they have been logically removed from Delta’s transaction log + retention hours, not their modification timestamps on the storage system. The default threshold is 7 days.
On Delta tables, Databricks does not automatically trigger VACUUM
operations. See Remove unused data files with vacuum.
If you run VACUUM
on a Delta table, you lose the ability to time travel back to a version older than the specified data retention period.
RETAIN num HOURS
The retention threshold.
DRY RUN
Return a list of files to be deleted.
Vacuum a Spark table (Apache Spark)
VACUUM [ [db_name.]table_name | path] [RETAIN num HOURS]
RETAIN num HOURS
The retention threshold.
Recursively vacuum directories associated with the Spark table and remove uncommitted files older than a retention threshold. The default threshold is 7 days.
On Spark tables, Databricks automatically triggers VACUUM
operations as data is written.