Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
Specifically, Delta Lake offers:
- ACID transactions on Spark: Serializable isolation levels ensure that readers never see inconsistent data.
- Scalable metadata handling: Leverages Spark’s distributed processing power to handle all the metadata for petabyte-scale tables with billions of files at ease.
- Streaming and batch unification: A table in Delta Lake is a batch table as well as a streaming source and sink. Streaming data ingest, batch historic backfill, interactive queries all just work out of the box.
- Schema enforcement: Automatically handles schema variations to prevent insertion of bad records during ingestion.
- Time travel: Data versioning enables rollbacks, full historical audit trails, and reproducible machine learning experiments.
- Upserts and deletes: Supports merge, update and delete operations to enable complex usecases like change-data-capture, slowly-changing-dimension (SCD) operations, streaming upserts, and so on.
Delta Lake on Databricks allows you to configure Delta Lake based on your workload patterns and provides optimized layouts and indexes for fast interactive queries. For information on Delta Lake on Databricks, see Optimizations.
The Delta Lake Quickstart provides an overview of the basics of working with Delta Lake.
The Quickstart shows how to build pipeline that reads JSON data into a Delta table, modify the table, read the table, display table history, and optimize the table. For runnable notebooks that demonstrate these features, see Introductory Notebooks.
To try out Delta Lake, see Try Databricks.
For runnable notebooks that demonstrate these features, see Introductory Notebooks.
- For answers to frequently asked questions, see Frequently Asked Questions (FAQ).
- For reference information on Delta Lake SQL commands, see SQL.
- For more background on and motivation for many Delta Lake features and applications, see these blog posts:
- New Databricks Delta Features Simplify Data Pipelines
- Simplifying Change Data Capture with Databricks Delta
- Building a Real-Time Attribution Pipeline with Databricks Delta
- Processing Petabytes of Data in Seconds with Databricks Delta
- Simplifying Streaming Stock Data Analysis Using Databricks Delta
- Build a Mobile Gaming Events Data Pipeline with Databricks Delta