Optimize performance on the VARIANT data with shredding
This feature is in Beta.
This page describes how to use shredding to optimize query performance on tables with semi-structured data in VARIANT
columns.
See VARIANT
type, Variant support in Delta Lake, and Query variant data.
What is shredding?
Shredding improves the query performance of VARIANT
data by storing commonly occurring fields as separate columns in the Parquet files. This process reduces the I/O required to read fields and improves compression using a columnar format instead of a binary blob.
Requirements
Databricks Runtime 17.2 or above is required to read and write shredded VARIANT tables.
Enable shredding
Enable shredding for your entire workspace through the Preview Portal. No code changes are required to read or write VARIANT
data with shredding.
After you enable the feature for your workspace, shredding is automatically enabled for the following scenarios:
CREATE TABLE
with one or moreVARIANT
columns.CREATE AND REPLACE TABLE
with one or moreVARIANT
columns.ALTER TABLE
when adding one or moreVARIANT
columns.
Opt out of shredding on future writes for a specific table
If you enable the shredding Beta for your workspace but want to exclude a specific table, set the enableVariantShredding
table property:
ALTER TABLE my_table SET TBLPROPERTIES ('delta.enableVariantShredding' = 'false');
Remove shredding on an existing table
To remove shredding on an existing table, drop the feature with the ALTER TABLE
command. This operation also rewrites shredded VARIANT
data in place to the unshredded VARIANT
format and sets the enableVariantShredding
table property to false.
ALTER TABLE my_table DROP FEATURE "variantShredding-preview";
Limitations
- Shredding data introduces some overhead on writes.
- Enabling shredding does not automatically convert existing
VARIANT
data in a table. It only applies to data written after the feature is enabled. To rewrite existingVARIANT
data, useREORG TABLE my_table APPLY (SHRED VARIANT)
. - Shredding applies to top-level
VARIANT
columns orVARIANT
fields in structs, excludingVARIANT
data stored inside arrays or maps.