VACUUM
Clean up files associated with a table. There are different versions of this command for Apache Spark and Delta tables.
Vacuum a Spark table
Recursively vacuums directories associated with the Spark table and remove uncommitted files older
than a retention threshold. The default threshold is 7 days. Databricks automatically triggers VACUUM
operations as data is written. See Clean up uncommitted files.
Syntax
VACUUM [ table_identifier | path] [RETAIN num HOURS]
table_identifier
[database_name.] table_name
: A table name, optionally qualified with a database name.path
Path to the table files.
RETAIN num HOURS
The retention threshold.
Vacuum a Delta table (Delta Lake on Databricks)
Recursively vacuum directories associated with the Delta table and remove data files that are no longer
in the latest state of the transaction log for the table and are older than a retention threshold.
Files are deleted according to the time they have been logically removed from Delta’s transaction log + retention hours,
not their modification timestamps on the storage system.
The default threshold is 7 days. Databricks does not automatically trigger VACUUM
operations on Delta tables. See Remove files no longer referenced by a Delta table.
If you run VACUUM
on a Delta table, you lose the ability time travel back to a
version older than the specified data retention period.
VACUUM table_identifier [RETAIN num HOURS] [DRY RUN]
table_identifier
[database_name.] table_name
: A table name, optionally qualified with a database name.delta.`<path-to-table>`
: The location of an existing Delta table.
RETAIN num HOURS
The retention threshold.
DRY RUN
Return a list of files to be deleted.