Tables and views in ETL
This article provides a conceptual overview of types of tables and views on Databricks, with an emphasis on best practices and recommendations for working with tables and views for ETL workloads.
Tables and views
A table is a structured dataset stored in a specific location. Databricks recommends using tables backed by the Delta Lake format for all tables created or updated on Databricks. Tables store data on storage and can be queried and manipulated using SQL commands or DataFrame APIs, supporting operations like insert, update, delete, and merge. See Delta table basics.
A view is a virtual table defined by a SQL query. A view does not itself store data. Instead, a view provides a way to present data from one or more tables in a specific format or abstraction. Views are useful for simplifying complex queries, encapsulating business logic, and providing a consistent interface to the underlying data without duplicating storage. See What is a view?.
Delta table basics
Databricks uses Delta Lake as the default format when creating tables. Tables backed by Delta Lake are also known as Delta tables. A Delta table stores data as a directory of files in cloud object storage and registers that table’s metadata to the metastore within a catalog and schema.
All Unity Catalog managed tables are Delta tables. Streaming tables and Materialized views are special implementations of Delta tables.
Delta tables contain rows of data that can be queried and updated using SQL, Python, and Scala APIs. End users interact with these tables the same way they would in any other database. Tables backed by Delta Lake can also be queried by systems outside Databricks. See Access Databricks data using external systems.
While it is possible to create tables on Databricks that don’t use Delta Lake, those tables don’t provide the transactional guarantees or optimized performance of Delta tables. For more information about other table types that use formats other than Delta Lake, see Work with external tables.
Differences between Delta tables, streaming tables, and materialized views
The following table answers frequently asked questions about the differences between Delta tables, streaming tables, and materialized views.
Delta tables as described below include both managed and external tables backed by Delta Lake. Unity Catalog managed tables have optimizations and features that are not supported by external tables.
Question | Delta table | Streaming table | Materialized view |
---|---|---|---|
What is it? | Tables backed by the Delta Lake, supporting ACID transactions, schema enforcement, and other Delta Lake features. | A Delta table that has been extended for declarative streaming and incremental processing use cases. | The result of a query whose result is always pre-computed and correct. |
What use cases is it recommended for? | All operations that query or save data on Databricks. | Declarative code that does the following:
| Declarative code that does the following:
|
How is it populated? | Procedural code ( | Declarative code including:
| Declarative queries |
What is the object type in Unity Catalog? | Table | Streaming table | Materialized view |
Who can update it? | Any writer that can update a Delta table. | Only the pipeline that defines the streaming table can update it. | Only the pipeline that defines the materialized view can update it. |
What Delta Lake features is it compatible with? | Supports all Delta Lake features. | Does not support:
| Does not support:
|