Skip to main content

Parquet v2

Available in Databricks Runtime 18.1 and above, Parquet v2 improves query performance and reduces storage for Delta Lake and Apache Iceberg tables by using advanced encodings, v2 data page headers, and INT64 timestamps.

How Parquet v2 works

Parquet v2 introduces improvements to the data file format that reduce storage and improve read performance:

  • Advanced encodings: Integer and string columns use new encodings with more efficient compression and decoding compared to the encodings used in Parquet v1.
  • V2 data page headers: Page-level statistics and indexes improve predicate pushdown and data skipping, reducing the amount of data scanned at query time.
  • INT64 timestamps: INT64 timestamps replace the legacy INT96 timestamp, improving column statistics, and timestamp encoding and compression.

Enable Parquet v2

Databricks automatically upgrades compatible Unity Catalog managed tables to Parquet v2. See Automatic upgrades.

To enable Parquet v2 manually, set the parquet.format.version table property to 2.12.0 using the appropriate prefix for your table type. Before manually enabling, review the limitations.

To enable Parquet v2 on an existing table:

SQL
ALTER TABLE <table_name> SET TBLPROPERTIES ('delta.parquet.format.version' = '2.12.0');

To enable Parquet v2 on a new table:

SQL
CREATE TABLE <table_name> (...)
TBLPROPERTIES ('delta.parquet.format.version' = '2.12.0');

After you set the property, all subsequent writes use v2 encodings. Parquet v1 and v2 files can coexist in the same table. Existing data files aren't rewritten automatically. For Delta Lake tables, to rewrite existing files to v2, use REORG TABLE in Databricks Runtime 18.2 and above:

SQL
REORG TABLE <table_name> APPLY (SET PARQUET (FORMAT_VERSION = '2.12.0'));

See Table properties reference for the full table property reference.

Roll back to Parquet v1

In Databricks Runtime 18.2 and above, to roll back an individual table to v1 encoding, run REORG TABLE with the SET PARQUET option:

SQL
REORG TABLE <table_name> APPLY (SET PARQUET (FORMAT_VERSION = '1.0.0'));

This command rewrites all data files using v1 encodings and resets the delta.parquet.format.version table property to 1.0.0.

See REORG TABLE for full REORG TABLE syntax.

Limitations

Parquet v2 has the following limitations:

  • For external engines, some Apache Iceberg readers might not support Parquet v2 for Iceberg tables. You must verify that Iceberg readers are compatible before enabling Parquet v2 on Iceberg tables.
  • For OpenSharing, before enabling Parquet v2, verify that the reader clients for OpenSharing recipients support Parquet v2 encodings.
  • Materialized views and streaming tables aren't automatically upgraded to Parquet v2. You can manually enable Parquet v2 on these table types on Databricks Runtime 18.1 and above.