Skip to main content

Optimize performance on the VARIANT data with shredding

Beta

This feature is in Beta.

This page describes how to use shredding to optimize query performance on tables with semi-structured data in VARIANT columns.

See VARIANT type, Variant support in Delta Lake, and Query variant data.

What is shredding?

Shredding improves the query performance of VARIANT data by storing commonly occurring fields as separate columns in the Parquet files. This process reduces the I/O required to read fields and improves compression using a columnar format instead of a binary blob.

Requirements

Databricks Runtime 17.2 or above is required to read and write shredded VARIANT tables.

Enable shredding

Enable shredding for your entire workspace through the Preview Portal. No code changes are required to read or write VARIANT data with shredding.

After you enable the feature for your workspace, shredding is automatically enabled for the following scenarios:

  • CREATE TABLE with one or more VARIANT columns.
  • CREATE AND REPLACE TABLE with one or more VARIANT columns.
  • ALTER TABLE when adding one or more VARIANT columns.

Opt out of shredding on future writes for a specific table

If you enable the shredding Beta for your workspace but want to exclude a specific table, set the enableVariantShredding table property:

SQL
ALTER TABLE my_table SET TBLPROPERTIES ('delta.enableVariantShredding' = 'false');

Remove shredding on an existing table

To remove shredding on an existing table, drop the feature with the ALTER TABLE command. This operation also rewrites shredded VARIANT data in place to the unshredded VARIANT format and sets the enableVariantShredding table property to false.

SQL
ALTER TABLE my_table DROP FEATURE "variantShredding-preview";

Limitations

  • Shredding data introduces some overhead on writes.
  • Enabling shredding does not automatically convert existing VARIANT data in a table. It only applies to data written after the feature is enabled. To rewrite existing VARIANT data, use REORG TABLE my_table APPLY (SHRED VARIANT).
  • Shredding applies to top-level VARIANT columns or VARIANT fields in structs, excluding VARIANT data stored inside arrays or maps.