Databricks Delta is in Private Preview. Contact your account manager or go to https://databricks.com/product/databricks-delta to request access.
Databricks Delta delivers a powerful transactional storage layer by harnessing the power of Apache Spark and Databricks DBFS. The core abstraction of Databricks Delta is an optimized Spark table that
- Stores data as Parquet files in DBFS.
- Maintains a transaction log that efficiently tracks changes to the table.
You read and write data stored in the
delta format using the same familiar Apache Spark SQL batch and streaming APIs that you use to work with Hive tables and DBFS directories. With the addition of the transaction log and other enhancements, Databricks Delta offers significant benefits:
- ACID transactions
- Multiple writers can simultaneously modify a dataset and see consistent views.
- Writers can modify a dataset without interfering with jobs reading the dataset.
- Fast read access
- Automatic file management organizes data into large files that can be read efficiently.
- Statistics enable speeding up reads by 10-100x and and data skipping avoids reading irrelevant information.
Databricks Delta requires Databricks Runtime 4.1 or above. If you created a Databricks Delta table using a Databricks Runtime lower than 4.1, the table version must be upgraded. For details, see Table Versioning.
- How do Databricks Delta tables compare to Hive SerDe tables?
Databricks Delta tables are managed to a greater degree. In particular, there are several Hive SerDe parameters that Databricks Delta manages on your behalf that you should never specify manually:
- Does Databricks Delta support multi-table transactions?
- Databricks Delta does not support multi-table transactions and foreign keys. Databricks Delta supports transactions at the table level.
- Does Databricks Delta support writes or reads using the Spark Streaming DStream API?
- Databricks Delta does not support the DStream API. We recommend Structured Streaming.
- What DDL and DML features does Databricks Delta not support?
- Unsupported DDL features:
ANALYZE TABLE PARTITION
ALTER TABLE [ADD|DROP] PARTITION
ALTER TABLE SET LOCATION
ALTER TABLE RECOVER PARTITIONS
ALTER TABLE SET SERDEPROPERTIES
CREATE TABLE LIKE
INSERT OVERWRITE DIRECTORY
- Unsupported DML features:
INSERT INTO [OVERWRITE]with static partitions.
- Subqueries in the
- Specifying a schema when reading from a table. A command such as
- Unsupported DDL features:
- What are the limitations of transactional writes?
Databricks Delta supports transactional writes from different clusters in the same workspace in Databricks Runtime 4.2 and above. All writers must be running Databricks Runtime 4.2 or above. The following features are not supported when running in this mode:
- Spark-submit job.
- Run a command using REST APIs.
- Client-side S3 encryption.
- Server-Side Encryption with Customer-Provided Encryption Keys.
- S3 paths with credentials in a cluster that cannot access AWS Security Token Service.
You can disable multi-cluster writes by setting
false. If they are disabled, writes to a single table must originate from a single cluster.
- You cannot concurrently modify the same Databricks Delta table from different workspaces.
- Writes to a single table using Databricks Runtime versions lower than 4.2 must originate from a single cluster. To perform transactional writes from multiple clusters in the same workspace you must upgrade to Databricks Runtime 4.2.