Skip to main content

What is Apache Iceberg in Databricks?

Preview

Unity Catalog-managed Apache Iceberg tables are available in Public Preview in Databricks Runtime 16.4 LTS and above. Foreign Iceberg tables are also in Public Preview in Databricks Runtime 16.4 LTS and above.

Apache Iceberg is an open source table format for analytics workloads. It supports features like schema evolution, time travel, and hidden partitioning. Like Delta Lake, Iceberg provides an abstraction layer that enables ACID transactions on data stored in object storage. Databricks supports Iceberg tables that use the Apache Parquet file format. Iceberg maintains atomicity and consistency by writing new metadata files for each table change.

An Iceberg catalog is the top-level layer of the Iceberg table architecture. It handles operations like creating, dropping, and renaming tables. Its main responsibility is to provide the current metadata when a table is loaded. Databricks supports Iceberg tables managed by:

All Iceberg tables in Databricks follow the open Iceberg table format specification. See the Iceberg table spec.

Create Iceberg tables in Unity Catalog

Iceberg tables created in Unity Catalog are managed Iceberg tables. You can create these tables using:

Managed Iceberg tables are fully integrated with Databricks platform features. Unity Catalog manages lifecycle tasks like snapshot expiration and file compaction on these tables. Managed Iceberg tables also support liquid clustering, which improves query performance. Predictive optimization automates these tasks to reduce storage costs and improve query speed.

Read Iceberg tables managed by other catalogs

A foreign Iceberg table is an Iceberg table managed by a catalog outside Unity Catalog. The external catalog stores the table's current metadata. Databricks uses Lakehouse Federation to retrieve metadata and read the table from object storage.

Foreign Iceberg tables are read-only in Databricks and have limited platform support.

Access Iceberg tables using external systems

You can access all Iceberg tables in Unity Catalog using the Iceberg REST Catalog API. This open API supports read and write operations from external Iceberg engines across different languages and platforms. See Access Databricks tables from Apache Iceberg clients.

The REST Catalog supports credential vending, which delivers temporary credentials to external engines for accessing the underlying storage. For more information, see Unity Catalog credential vending for external system access.

Iceberg table limitations

The following limitations apply to Iceberg tables in Databricks and are subject to change:

  • Iceberg tables support only the Apache Parquet file format.
  • Databricks supports versions 1 and 2 of the Apache Iceberg specification, with the following exceptions:
    • Row-level deletes, including position deletes and equality-based deletes, aren't supported.
    • Branching and tagging aren't supported. Only the main branch is accessible when reading foreign Iceberg tables.
    • Partition evolution is supported on managed Iceberg tables only when interacting from external Iceberg engines. Foreign Iceberg tables don't support partition evolution.
    • The following data types aren't supported:
      • UUID
      • Fixed(L)
      • TIME

Managed Iceberg table limitations

The following limitations apply specifically to managed Iceberg tables:

  • Vector search isn't supported on managed Iceberg tables.
  • Apache Iceberg doesn't support change data feed. As a result, incremental processing is not supported when reading Managed Iceberg tables as a source for:
    • Materialized views and streaming tables
    • Lakehouse Monitoring
    • Online tables
    • Lakebase
    • Data classification

Foreign Iceberg table limitations

The following limitations apply specifically to foreign Iceberg tables:

  • Time travel is supported only for Iceberg snapshots that have been previously read in Databricks (that is, snapshots where a SELECT statement was executed).
  • Using bucket transform functions for Iceberg partitioning can degrade query performance when conditional filters are used.
  • Cloud storage tiering products such as Amazon S3 are not integrated with foreign Iceberg tables. Accessing foreign Iceberg tables in Databricks can restore data archived in lower-cost storage tiers.
  • On dedicated access mode clusters, reads and REFRESH FOREIGN TABLE operations on Iceberg tables require ALL PRIVILEGES.