CONVERT TO DELTA

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime

Converts an existing Parquet table to a Delta table in-place. This command lists all the files in the directory, creates a Delta Lake transaction log that tracks these files, and automatically infers the data schema by reading the footers of all Parquet files. The conversion process collects statistics to improve query performance on the converted Delta table. If you provide a table name, the metastore is also updated to reflect that the table is now a Delta table.

This command supports converting Iceberg tables whose underlying file format is Parquet. In this case, the converter generates the Delta Lake transaction log based on Iceberg table’s native file manifest, schema and partitioning information.

Syntax

CONVERT TO DELTA table_name [ NO STATISTICS ] [ PARTITIONED BY clause ]

Parameters

  • table_name

    Either an optionally qualified table identifier or a path to a parquet or iceberg file directory. The name must not include a temporal specification. For Iceberg tables, you can only use paths, as converting managed iceberg tables is not supported.

  • NO STATISTICS

    Bypass statistics collection during the conversion process and finish conversion faster. After the table is converted to Delta Lake, you can use OPTIMIZE ZORDER BY to reorganize the data layout and generate statistics.

  • PARTITIONED BY

    Partition the created table by the specified columns. When table_name is a path, the PARTITIONED BY is required for partitioned data. When the table_name is a qualified table identifier, PARTITIONED BY clause is optional and the partition specification are loaded from the metastore. In either approach, the conversion process aborts and throw an exception if the directory structure does not conform to the provided or loaded PARTITIONED BY specification.

    Note

    In Databricks Runtime 11.1 and below, PARTITIONED BY is a required argument for all partitioned data.

Examples

Note

You do not need to provide partitioning information for Iceberg tables or tables registered to the metastore.

CONVERT TO DELTA database_name.table_name; -- only for Parquet tables

CONVERT TO DELTA parquet.`s3://my-bucket/path/to/table`
  PARTITIONED BY (date DATE); -- if the table is partitioned

CONVERT TO DELTA iceberg.`s3://my-bucket/path/to/table`; -- uses Iceberg manifest for metadata

Caveats

Any file not tracked by Delta Lake is invisible and can be deleted when you run VACUUM. You should avoid updating or appending data files during the conversion process. After the table is converted, make sure all writes go through Delta Lake.

It is possible that multiple external tables share the same underlying Parquet directory. In this case, if you run CONVERT on one of the external tables, then you will not be able to access the other external tables because their underlying directory has been converted from Parquet to Delta Lake. To query or write to these external tables again, you must run CONVERT on them as well.

CONVERT populates the catalog information, such as schema and table properties, to the Delta Lake transaction log. If the underlying directory has already been converted to Delta Lake and its metadata is different from the catalog metadata, a convertMetastoreMetadataMismatchException is thrown.

While using Databricks Runtime, if you want CONVERT to overwrite the existing metadata in the Delta Lake transaction log, set the SQL configuration spark.databricks.delta.convert.metadataCheck.enabled to false.

Undo the conversion

If you have performed Delta Lake operations such as DELETE or OPTIMIZE that can change the data files:

  1. Run the following command for garbage collection:

VACUUM delta.`<path-to-table>` RETAIN 0 HOURS
  1. Delete the <path-to-table>/_delta_log directory.