Deletion vectors in Databricks
Deletion vectors are a storage optimization feature that accelerates modifications to tables. By default, deleting a single row requires rewriting the entire Parquet file containing that record. Deletion vectors avoid this overhead. When deletion vectors are enabled, DELETE, UPDATE, and MERGE operations mark rows as modified without rewriting the Parquet file. Reads then resolve the current table state by applying the modifications recorded in deletion vectors.
Databricks recommends using Databricks Runtime 14.3 LTS and above to write tables with deletion vectors to use all optimizations. You can read tables with deletion vectors enabled in Databricks Runtime 12.2 LTS and above.
In Databricks Runtime 14.2 and above, tables with deletion vectors support row-level concurrency. See Write conflicts with row-level concurrency.
Photon leverages deletion vectors for predictive I/O updates, accelerating DELETE, MERGE, and UPDATE operations. All clients that support reading deletion vectors can read updates that produced deletion vectors, regardless of whether predictive I/O produced these updates. See Use predictive I/O to accelerate updates.
Enable deletion vectors
A workspace admin setting controls whether deletion vectors are auto-enabled for new tables. See Auto-enable deletion vectors.
If the workspace setting for controlling auto-enabling of deletion vectors is used, then, based on the option selected for table types, deletion vectors are enabled by default when you create a new table using a SQL warehouse or Databricks Runtime 14.1 or above.
Deletion vectors are not enabled by default for materialized views and streaming tables stored in Hive metastore.
To manually enable or disable support for deletion vectors on any table or view (including streaming tables and materialized views), use the enableDeletionVectors table property. Manually choose whether to enable deletion vectors on a table when you create or alter the table, as in the following example. You can't use an ALTER statement to enable or disable deletion vectors on a materialized view or Streaming table.
-- For Delta tables
CREATE TABLE <table-name> [options] TBLPROPERTIES ('delta.enableDeletionVectors' = true);
ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);
-- For Iceberg tables, use iceberg.enableDeletionVectors instead of delta.enableDeletionVectors
When you enable deletion vectors, the table protocol is upgraded. After upgrading, the table will not be readable by clients that do not support deletion vectors. See Delta Lake feature compatibility and protocols.
In Databricks Runtime 14.1 and above, you can drop the deletion vectors table feature to enable compatibility with other clients. See Drop a Delta Lake table feature and downgrade table protocol.
Apply changes to Parquet data files
Deletion vectors indicate changes to rows as soft-deletes that logically modify existing Parquet data files in the table. These changes are applied physically when one of the following events causes the data files to be rewritten:
- An
OPTIMIZEcommand is run on the table. - Auto-compaction triggers a rewrite of a data file with a deletion vector.
REORG TABLE ... APPLY (PURGE)is run against the table.
Events related to file compaction do not have strict guarantees for resolving changes recorded in deletion vectors, and some changes recorded in deletion vectors might not be applied if target data files would not otherwise be candidates for file compaction. REORG TABLE ... APPLY (PURGE) rewrites all data files containing records with modifications recorded using deletion vectors. See REORG TABLE.
Modified data might still exist in the old files. You can run VACUUM to physically delete the old files. REORG TABLE ... APPLY (PURGE) creates a new version of the table when it completes. This completion time is the timestamp you must consider for the retention threshold for your VACUUM operation to fully remove deleted files. See Remove unused data files with vacuum.
Client compatibility
Databricks uses deletion vectors to power predictive I/O for updates on Photon-enabled compute. See Use predictive I/O to accelerate updates.
Support for using deletion vectors for reads and writes varies by client.
The following table denotes required client versions for reading and writing tables with deletion vectors enabled and specifies which write operations use deletion vectors:
Client | Write deletion vectors | Read deletion vectors |
|---|---|---|
Databricks Runtime with Photon | Supports | Requires Databricks Runtime 12.2 LTS or above. |
Databricks Runtime without Photon | Supports | Requires Databricks Runtime 12.2 LTS or above. |
OSS Apache Spark with OSS Delta Lake | Supports | Requires OSS Delta 2.3.0 or above. |
Delta Sharing recipients | Writes are not supported on Delta Sharing tables. | Databricks: Requires Databricks Runtime 14.1 or above. Open source Apache Spark: Requires |
For support with other clients, see the OSS Delta Lake integrations documentation.
Limitations
- UniForm Iceberg v2 doesn't support deletion vectors. Apache Iceberg v3 supports deletion vectors on tables with UniForm enabled. See Use Apache Iceberg v3 features.
- You cannot use a GENERATE statement to generate a manifest file for a table that has files using deletion vectors. To generate a manifest, first run a REORG TABLE … APPLY (PURGE) statement and then run the
GENERATEstatement. You must ensure that no concurrent write operations are running when you submit theREORGstatement. - You cannot incrementally generate manifest files for a table with deletion vectors enabled (for example, by setting the table property
delta.compatibility.symlinkFormatManifest.enabled=true). - If you enable deletion vectors on a materialized view or Streaming table and subsequently disable deletion vectors, future writes to the view or table are prevented from using deletion vectors, but existing deletion vectors are not removed.
- You cannot downgrade the table protocol after enabling deletion vectors on a materialized view or Streaming table. After enabling, the table feature for deletion vectors cannot be removed, even if you subsequently disable deletion vectors on the view or table.