Delta Lake provides ACID transaction guarantees between reads and writes. This means that:
- Multiple writers, even if they are across multiple clusters, can simultaneously modify a table and see a consistent snapshot view of the table and there will be a serial order for these writes.
- Readers will continue to see the consistent snapshot view of the table that the Apache Spark job started with, even when the table is modified during the job.
Delta Lake uses optimistic concurrency control to provide transactional guarantees between writes. Under this mechanism, writes operate in three stages:
- Read: Reads (if needed) the latest available version of the table to identify which files need to be modified (that is, rewritten).
- Write: Stages all the changes by writing new data files.
- Validate and commit: Before committing the changes, checks whether the proposed changes conflict with any other changes that may have been concurrently committed since the snapshot that was read. If there are no conflicts, all the staged changes are committed as a new versioned snapshot, and the write operation succeeds. However, if there are conflicts, the write operation fails with a concurrent modification exception rather than corrupting the table as would happen with open source Spark.
The isolation level of a table defines the degree to which a transaction must be isolated from modifications made by concurrent transactions. For information on the isolation levels supported by Delta Lake on Databricks, see Isolation Levels.