Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
Delta Lake on Databricks allows you to configure Delta Lake based on your workload patterns and provides optimized layouts and indexes for fast interactive queries.
This is the documentation for Delta Lake on Databricks.
- Introduction to Delta Lake
- Introductory Notebooks
- Table Batch Reads and Writes
- Table Streaming Reads and Writes
- Table Deletes, Updates, and Merges
- Table Utility Commands
- Delta Lake API Reference
- Concurrency Control
- Migrate Workloads to Delta Lake
- Best Practices
- Frequently Asked Questions (FAQ)
- What is Delta Lake?
- How is Delta Lake related to Apache Spark?
- What format does Delta Lake use to store data?
- How can I read and write data with Delta Lake?
- Where does Delta Lake store the data?
- Why is Delta Lake data I deleted still stored in S3?
- Why does a table show old data after I delete Delta Lake files with
rm -rfand create a new table in the same location?
- Can I stream data directly into and from Delta tables?
- Does Delta Lake support writes or reads using the Spark Streaming DStream API?
- When I use Delta Lake, will I be able to port my code to other Spark platforms easily?
- How do Delta tables compare to Hive SerDe tables?
- What DDL and DML features does Delta Lake not support?
- Does Delta Lake support multi-table transactions?
- How can I change the type of a column?
- What does it mean that Delta Lake supports multi-cluster writes?
- What are the limitations of multi-cluster writes?
- Can I access Delta tables outside of Databricks Runtime?
- Additional Resources