FSCK REPAIR TABLE

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Removes the file entries from the transaction log of a Delta table that can no longer be found in the underlying file system. This can happen when these files have been manually deleted.

Syntax

FSCK REPAIR TABLE table_name [DRY RUN]

Parameters

  • table_name

    Identifies an existing Delta table. The name must not include a temporal specification.

  • DRY RUN

    Shows information about the file entries that would be removed from the transaction log of a Delta table by FSCK REPAIR TABLE, because they can no longer be found in the underlying file system. This can happen when these files have been manually deleted. File entries are either a data file path or a combination of a data file path and deletion vector file path. File entries are included in the output when the data file is missing, when the deletion vector file is missing, or when both are missing.

    By default, DRY RUN only returns the first 1000 files. You can increase this threshold by setting the SparkSession variable spark.databricks.delta.fsck.maxNumEntriesInResult to a higher value before running the command in a notebook.

Returns

For DRY RUN A report of the form:

  • dataFilePath STRING NOT NULL

  • dataFileMissing BOOLEAN NOT NULL

  • deletionVectorPath STRING

  • deletionVectorFileMissing BOOLEAN NOT NULL

Examples

 Assume file1.parquet is missing and no DV is expected.
> FSCK REPAIR TABLE t DRY RUN;
  dataFilePath dataFileMissing deletionVectorPath deletionVectorFileMissing
 ------------- --------------- ------------------ -------------------------
 file1.parquet            true               null                     false

 Assume dv1.bin is missing.
> FSCK REPAIR TABLE t DRY RUN;
  dataFilePath dataFileMissing deletionVectorPath deletionVectorFileMissing
 ------------- --------------- ------------------ -------------------------
 file1.parquet           false            dv1.bin                      true