Vacuum

Clean up files associated with a table. There are different versions of this command for Apache Spark and Delta tables.

Vacuum a Spark table

VACUUM [ [db_name.]table_name | path] [RETAIN num HOURS]

RETAIN num HOURS

The retention threshold.

Recursively vacuum directories associated with the Spark table and remove uncommitted files older than a retention threshold. The default threshold is 7 days. Databricks automatically triggers VACUUM operations as data is written. See Clean up uncommitted files.

Vacuum a Delta table (Delta Lake on Databricks)

VACUUM [ [db_name.]table_name | path] [RETAIN num HOURS] [DRY RUN]

Recursively vacuum directories associated with the Delta table and remove files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. The default threshold is 7 days. Databricks does not automatically trigger VACUUM operations on Delta tables. See Vacuum.

If you run VACUUM on a Delta table, you lose the ability time travel back to a version older than the specified data retention period.

RETAIN num HOURS

The retention threshold.

DRY RUN

Return a list of files to be deleted.