What are deletion vectors?
Deletion vectors are a storage optimization feature you can enable on Delta Lake tables. By default, when a single row in a data file is deleted, the entire Parquet file containing the record must be rewritten. With deletion vectors enabled for the table, DELETE
, UPDATE
, and MERGE
operations use deletion vectors to mark existing rows as removed or changed without rewriting the Parquet file. Subsequent reads on the table resolve the current table state by applying the deletions indicated by deletion vectors to the most recent table version.
Databricks recommends using Databricks Runtime 14.3 LTS and above to write tables with deletion vectors to use all optimizations. You can read tables with deletion vectors enabled in Databricks Runtime 12.2 LTS and above.
In Databricks Runtime 14.2 and above, tables with deletion vectors support row-level concurrency. See Write conflicts with row-level concurrency.
Note
Photon leverages deletion vectors for predictive I/O updates, accelerating DELETE
, MERGE
, and UPDATE
operations. All clients that support reading deletion vectors can read updates that produced deletion vectors, regardless of whether predictive I/O produced these updates. See Use predictive I/O to accelerate updates.
Enable deletion vectors
Important
A workspace admin setting controls whether deletion vectors are auto-enabled for new Delta tables. See Auto-enable deletion vectors.
If the workspace setting for controlling auto-enabling of deletion vectors is used, then, based on the option selected for table types, deletion vectors are enabled by default when you create a new table using a SQL warehouse or Databricks Runtime 14.1 or above. Deletion vectors are not enabled by default when you create materialized views or streaming tables and must be manually enabled when you create a materialized view or streaming table.
To manually enable support for deletion vectors on a table or view, use the delta.enableDeletionVectors
table property. You can manually enable deletion vectors on a delta table when you create or alter the table. You can manually enable deletion vectors on a materialized view or streaming table only when you create the materialized view or streaming table. You cannot use an ALTER
statement to enable deletion vectors on a materialized view or streaming table.
CREATE TABLE <table-name> [options] TBLPROPERTIES ('delta.enableDeletionVectors' = true);
ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);
Warning
When you enable deletion vectors, the table protocol is upgraded. After upgrading, the table will not be readable by Delta Lake clients that do not support deletion vectors. See How does Databricks manage Delta Lake feature compatibility?.
In Databricks Runtime 14.1 and above, you can drop the deletion vectors table feature to enable compatibility with other Delta clients. See Drop Delta table features.
Apply changes to Parquet data files
Deletion vectors indicate changes to rows as soft-deletes that logically modify existing Parquet data files in the Delta Lake table. These changes are applied physically when one of the following events causes the data files to be rewritten:
An
OPTIMIZE
command is run on the table.Auto-compaction triggers a rewrite of a data file with a deletion vector.
REORG TABLE ... APPLY (PURGE)
is run against the table.
Events related to file compaction do not have strict guarantees for resolving changes recorded in deletion vectors, and some changes recorded in deletion vectors might not be applied if target data files would not otherwise be candidates for file compaction. REORG TABLE ... APPLY (PURGE)
rewrites all data files containing records with modifications recorded using deletion vectors. See REORG TABLE.
Note
Modified data might still exist in the old files. You can run VACUUM to physically delete the old files. REORG TABLE ... APPLY (PURGE)
creates a new version of the table when it completes. This completion time is the timestamp you must consider for the retention threshold for your VACUUM
operation to fully remove deleted files. See Remove unused data files with vacuum.
Compatibility with Delta clients
Databricks uses deletion vectors to power predictive I/O for updates on Photon-enabled compute. See Use predictive I/O to accelerate updates.
Support for using deletion vectors for reads and writes varies by client.
The following table denotes required client versions for reading and writing Delta tables with deletion vectors enabled and specifies which write operations use deletion vectors:
Client |
Write deletion vectors |
Read deletion vectors |
---|---|---|
Databricks Runtime with Photon |
Supports |
Requires Databricks Runtime 12.2 LTS or above. |
Databricks Runtime without Photon |
Supports |
Requires Databricks Runtime 12.2 LTS or above. |
OSS Apache Spark with OSS Delta Lake |
Supports |
Requires OSS Delta 2.3.0 or above. |
Delta Sharing recipients |
Writes are not supported on Delta Sharing tables |
Databricks: Requires DBR 14.1 or above. Open source Apache Spark: Requires |
Note
For support with other Delta clients, see the OSS Delta Lake integrations documentation.
Limitations
UniForm does not support deletion vectors.
You cannot use a GENERATE statement to generate a manifest file for a table that has files using deletion vectors. To generate a manifest, first run a REORG TABLE … APPLY (PURGE) statement and then run the
GENERATE
statement. You must ensure that no concurrent write operations are running when you submit theREORG
statement.You cannot incrementally generate manifest files for a table with deletion vectors enabled (for example, by setting the table property
delta.compatibility.symlinkFormatManifest.enabled=true
).If you enable deletion vectors on a materialized view or streaming table and subsequently disable deletion vectors, future writes to the view or table are prevented from using deletion vectors, but existing deletion vectors are not removed.
You cannot downgrade the table protocol after enabling deletion vectors on a materialized view or streaming table. After enabling, the table feature for deletion vectors cannot be removed, even if you subsequently disable deletion vectors on the view or table.
You cannot run
REORG
on materialized views or streaming tables to commit changes recorded in deletion vectors to Parquet data files backing these objects. Because of this limitation, do not enable deletion vectors on materialized views or streaming tables if you must guarantee complete deletion of records (for example, for GDPR or CCPA compliance).