Use variant shredding to optimize performance
This feature is in Beta. Workspace admins can control access to this feature from the Previews page. See Manage Databricks previews.
Variant shredding improves query performance on VARIANT columns by storing commonly occurring fields as separate columns in the underlying Parquet files. Shredding reduces the I/O required to read fields and improves compression by using a columnar format instead of a binary blob.
See VARIANT type, Variant type support for Apache Iceberg and Delta Lake, and Query variant data.
Requirements
Databricks Runtime 17.2 or above is required to read and write shredded VARIANT tables.
Enable shredding
Workspace admins can enable shredding from the workspace Previews page. See Manage Databricks previews.
No code changes are required to read or write VARIANT data with shredding.
After you enable the feature for your workspace, shredding is automatically enabled on tables for the following scenarios:
CREATE TABLEwith one or moreVARIANTcolumns.CREATE AND REPLACE TABLEwith one or moreVARIANTcolumns.ALTER TABLEwhen adding one or moreVARIANTcolumns.
For existing tables, you can manually opt in to shredding by setting the enableVariantShredding table property to true and opt out by setting the property to false, provided that shredding is enabled at the workspace level:
- Delta Lake
- Iceberg table
ALTER TABLE my_table SET TBLPROPERTIES ('delta.enableVariantShredding' = 'true');
ALTER TABLE my_table SET TBLPROPERTIES ('iceberg.enableVariantShredding' = 'true');
Verify that shredding is enabled by checking that the table property enableVariantShredding is set to true.
Opt out of shredding for a specific table
If you enable the shredding Beta for your workspace but want to exclude a specific table, set the enableVariantShredding table property to false:
- Delta Lake
- Iceberg table
ALTER TABLE my_table SET TBLPROPERTIES ('delta.enableVariantShredding' = 'false');
ALTER TABLE my_table SET TBLPROPERTIES ('iceberg.enableVariantShredding' = 'false');
Remove shredding from an existing table
To remove shredding on an existing table, drop the feature with the ALTER TABLE command. This operation also rewrites shredded VARIANT data in place to the unshredded VARIANT format and sets the enableVariantShredding table property to false.
ALTER TABLE my_table DROP FEATURE "variantShredding-preview";
Limitations
- Shredding data introduces some overhead on writes.
- Enabling shredding doesn't automatically convert existing
VARIANTdata in a table. It only applies to data written after the feature is enabled. To rewrite existingVARIANTdata, useREORG TABLE my_table APPLY (SHRED VARIANT). - Shredding applies to top-level
VARIANTcolumns orVARIANTfields in structs, excludingVARIANTdata stored inside arrays or maps.