Optimize performance on the VARIANT data with shredding
This feature is in Beta.
This page describes how to use shredding to optimize query performance on tables with semi-structured data in VARIANT columns.
See VARIANT type, Variant support in Delta Lake, and Query variant data.
What is shredding?
Shredding improves the query performance of VARIANT data by storing commonly occurring fields as separate columns in the Parquet files. This process reduces the I/O required to read fields and improves compression using a columnar format instead of a binary blob.
Requirements
Databricks Runtime 17.2 or above is required to read and write shredded VARIANT tables.
Enable shredding
Enable shredding for your entire workspace through the Preview Portal. No code changes are required to read or write VARIANT data with shredding.
After you enable the feature for your workspace, shredding is automatically enabled for the following scenarios:
CREATE TABLEwith one or moreVARIANTcolumns.CREATE AND REPLACE TABLEwith one or moreVARIANTcolumns.ALTER TABLEwhen adding one or moreVARIANTcolumns.
Opt out of shredding on future writes for a specific table
If you enable the shredding Beta for your workspace but want to exclude a specific table, set the enableVariantShredding table property:
ALTER TABLE my_table SET TBLPROPERTIES ('delta.enableVariantShredding' = 'false');
Remove shredding on an existing table
To remove shredding on an existing table, drop the feature with the ALTER TABLE command. This operation also rewrites shredded VARIANT data in place to the unshredded VARIANT format and sets the enableVariantShredding table property to false.
ALTER TABLE my_table DROP FEATURE "variantShredding-preview";
Limitations
- Shredding data introduces some overhead on writes.
- Enabling shredding does not automatically convert existing
VARIANTdata in a table. It only applies to data written after the feature is enabled. To rewrite existingVARIANTdata, useREORG TABLE my_table APPLY (SHRED VARIANT). - Shredding applies to top-level
VARIANTcolumns orVARIANTfields in structs, excludingVARIANTdata stored inside arrays or maps.