This is the documentation for Delta Lake on Databricks.
Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.
Delta Lake on Databricks allows you to configure Delta Lake based on your workload patterns and provides optimized layouts and indexes for fast interactive queries.
- Introduction to Delta Lake
- Introductory Notebooks
- Table Batch Reads and Writes
- Table Streaming Reads and Writes
- Concurrency Control
- Porting Existing Workloads to Delta Lake
- Frequently Asked Questions (FAQ)
- What is Delta Lake?
- How is Delta Lake related to Apache Spark?
- What format does Delta Lake use to store data?
- How can I read and write data with Delta Lake?
- Where does Delta Lake store the data?
- Can I stream data directly into Delta Lake tables?
- Can I stream data from Delta Lake tables?
- Does Delta Lake support writes or reads using the Spark Streaming DStream API?
- When I use Delta Lake, will I be able to port my code to other Spark platforms easily?
- How do Delta Lake tables compare to Hive SerDe tables?
- Does Delta Lake support multi-table transactions?
- What DDL and DML features does Delta Lake not support?
- How can I change the type of a column?
- When should I use partitioning with Delta Lake tables?
- When should I use
Z-ORDER BYwith Delta Lake tables?
- What does it mean that Delta Lake supports multi-cluster writes?
- What are the limitations of multi-cluster writes?
- Why is Delta Lake data I deleted still stored in S3?
- Why does a table show old data after I delete Delta Lake files with
rm -rfand create a new table in the same location?
- Can I access Delta Lake tables outside of Databricks Runtime?