Concurrency Control

Delta Lake provides ACID transaction guarantees between reads and writes. This means that:

  • Multiple writers, across multiple clusters, can simultaneously modify a table partition and see a consistent snapshot view of the table and there will be a serial order for these writes.
  • Readers will continue to see the consistent snapshot view of the table that the Apache Spark job started with, even when the table is modified during the job.

Optimistic concurrency control

Delta Lake uses optimistic concurrency control to provide transactional guarantees between writes. Under this mechanism, writes operate in three stages:

  1. Read: Reads (if needed) the latest available version of the table to identify which files need to be modified (that is, rewritten).
  2. Write: Stages all the changes by writing new data files.
  3. Validate and commit: Before committing the changes, checks whether the proposed changes conflict with any other changes that may have been concurrently committed since the snapshot that was read. If there are no conflicts, all the staged changes are committed as a new versioned snapshot, and the write operation succeeds. However, if there are conflicts, the write operation fails with a concurrent modification exception rather than corrupting the table as would happen with open source Spark.

The isolation level of a table defines the degree to which a transaction must be isolated from modifications made by concurrent operations. For information on the isolation levels supported by Delta Lake on Databricks, see Isolation Levels.

Write conflicts

The following table describes which pairs of write operations can conflict in each isolation level.

  INSERT UPDATE, DELETE, MERGE INTO OPTIMIZE
INSERT Cannot conflict    
UPDATE, DELETE, MERGE INTO Can conflict in Serializable, cannot conflict in WriteSerializable Can conflict in Serializable and WriteSerializable  
OPTIMIZE Cannot conflict Can conflict in Serializable and WriteSerializable Can conflict in Serializable and WriteSerializable

Avoid conflicts using partitioning and disjoint command conditions

In all cases marked “can conflict”, whether the two operations will conflict depends on whether they operate on the same set of files. You can make the two sets of files disjoint by partitioning the table by the same columns as those used in the conditions of the operations. For example, the two commands UPDATE table WHERE date > '2010-01-01' ... and DELETE table WHERE date < '2010-01-01' will conflict if the table is not partitioned by date, as both can attempt to modify the same set of files. Partitioning the table by date will avoid the conflict. Hence, partitioning a table according to the conditions commonly used on the command can reduce conflicts significantly. However, partitioning a table by a column that has high cardinality can lead to other performance issues due to large number of subdirectories.

Conflict exceptions

When a transaction conflict occurs, you will observe one of the following exceptions.

ConcurrentAppendException

When a transaction adds records in a partition and a concurrent transaction also adds records in the same partition.

When the isolation level is set to Serializable, any append to a partition directory that your transaction reads would cause this error. When the isolation level is set to WriteSerializable, then your transaction will conflict only if there was an update (files were rewritten due to DELETE, UPDATE, or MERGE) to a partition your operation was reading.

This exception is typically thrown during concurrent MERGE operations. While the concurrent operations may be physically working on separate partition directories, the operations may conflict unless this separation is explicit in the MERGE condition. Consider the following example.

// Target 'deltaTable' is partitioned by date and country
deltaTable.as("t").merge(
    source.as("s"),
    "s.user_id = t.user_id AND s.date = t.date AND s.country = t.country")
  .whenMatched().updateAll()
  .whenNotMatched().insertAll()
  .execute()

Suppose you run the above code concurrently for different dates or countries. Since each job is working on an independent partition on the target Delta table, you don’t expect any conflicts. However the condition is not explicit enough and can scan the entire table and can conflict with concurrent operations updating any other partitions. Instead, you can rewrite your statement to run merge on a specific date and/or country as:

// Target 'deltaTable' is partitioned by date and country
deltaTable.as("t").merge(
    source.as("s"),
    "s.user_id = t.user_id AND s.date = '" + <date> + "' AND s.country = '" + <country> + "'")
  .whenMatched().updateAll()
  .whenNotMatched().insertAll()
  .execute()

This operation is now safe to run concurrently on different dates and countries.

ConcurrentDeleteReadException

A concurrent operation deleted a file that your operation read. Common causes are a delete, update, or merge operation that rewrites files.

ConcurrentDeleteDeleteException

A concurrent operation deleted a file that your operation also deletes. This could be caused by two concurrent <compaction> operation rewriting the same files.

MetadataChangedException

A concurrent transaction updates the metadata of a Delta table. Common causes are ALTER TABLE operations or writes to your Delta table that update the schema of the table.

ConcurrentTransactionException

If a streaming query using the same checkpoint location is started multiple times concurrently and tries to write to the Delta table at the same time. You should never have two streaming queries use the same checkpoint location and run at the same time.

ProtocolChangedException

This occurs when your Delta table is upgraded to a new version. You may need to upgrade your Delta Lake version for future operations to succeed.

See table versions for more details.