Skip to main content

What is Apache Iceberg in Databricks?

Public Preview

Unity Catalog-managed Iceberg tables are available in Public Preview in Databricks Runtime 16.4 LTS and above. Foreign Iceberg tables are also in Public Preview in Databricks Runtime 16.4 LTS and above.

Iceberg v3 features are available in Public Preview in Databricks Runtime 18.0 and above. See Use Apache Iceberg v3 features.

Apache Iceberg is an open-source table format for analytics workloads. It supports features like schema evolution, time travel, and hidden partitioning. Like Delta Lake, Iceberg provides an abstraction layer that enables ACID transactions on data stored in object storage. Databricks supports Iceberg tables that use the Apache Parquet file format. Iceberg maintains atomicity and consistency by writing new metadata files for each table change.

An Iceberg catalog is the top-level layer of the Iceberg table architecture. It handles operations like creating, dropping, and renaming tables. Its main responsibility is to provide the current metadata when a table is loaded. Databricks supports Iceberg tables managed by:

All Iceberg tables in Databricks follow the open Iceberg table format specification. See the Iceberg table specification.

Create Iceberg tables in Unity Catalog

Iceberg tables created in Unity Catalog are managed Iceberg tables. You can create these tables using:

Managed Iceberg tables are fully integrated with Databricks platform features. Unity Catalog manages lifecycle tasks like snapshot expiration and file compaction on these tables. Managed Iceberg tables also support liquid clustering, which improves query performance. Predictive optimization automates these tasks to reduce storage costs and improve query speed. Databricks recommends using Iceberg clients 1.9.2 and above to read and write to Unity Catalog.

Read Iceberg tables managed by other catalogs

A foreign Iceberg table is an Iceberg table managed by a catalog outside Unity Catalog. The external catalog stores the table's current metadata. Databricks uses Lakehouse Federation to retrieve metadata and read the table from object storage.

Foreign Iceberg tables are read-only in Databricks and have limited platform support.

Access Iceberg tables using external systems

You can access all Iceberg tables in Unity Catalog using the Iceberg REST Catalog API. This open API supports read and write operations from external Iceberg engines across different languages and platforms. See Access Databricks tables from Apache Iceberg clients.

The REST Catalog supports credential vending, which delivers temporary credentials to external engines for accessing the underlying storage. For more information, see Unity Catalog credential vending for external system access.

warning

Credential vending is not supported on workspaces that use default storage. See Limitations.

Partition evolution

With partition evolution, you can change the partitioning scheme of an existing Apache Iceberg table without rewriting data. New data is written with the updated partition layout and existing data retains its original partition layout. Apache Iceberg tracks the partition specs and applies the correct filter at query time. See partition evolution for Apache Iceberg.

note

Partition evolution is supported on managed Iceberg tables through external Iceberg engines using the Iceberg REST Catalog, but not through Databricks SQL. Expression-based partition transforms such as years() and bucket() are not supported for managed Iceberg tables. See Iceberg table limitations.

To configure external access, see Access Databricks tables from Apache Iceberg clients.

The following examples show how to use partition evolution with Spark SQL and the Iceberg extension. For Apache Iceberg partition evolution syntax and supported transforms, see Apache Iceberg Spark DDL.

Add a partition field

To add a new partition field to an existing table:

SQL
ALTER TABLE catalog.schema.table ADD PARTITION FIELD column_name;

Drop a partition field

To remove an existing partition field from a table:

SQL
ALTER TABLE catalog.schema.table DROP PARTITION FIELD column_name;

Replace a partition field

To swap one partition field for another without an intermediate repartition:

SQL
ALTER TABLE catalog.schema.table REPLACE PARTITION FIELD old_column WITH new_column;

Iceberg table limitations

The following limitations apply to Iceberg tables in Databricks and are subject to change:

  • Iceberg tables support only the Apache Parquet file format.
  • Databricks supports versions 1, 2, and 3 of the Iceberg specification.
  • Iceberg v2 position deletes and equality-based deletes aren't supported. Instead, Databricks supports Iceberg v3 deletion vectors for row-level deletions.
  • Branching and tagging aren't supported. Only the main branch is accessible when reading foreign Iceberg tables.
  • Partitioning:
    • Partition evolution is supported on managed Iceberg tables only when interacting from external Iceberg engines.
    • Foreign Iceberg tables don't support partition evolution.
    • Partitioning by BINARY type isn't supported.
  • Views aren't supported.
  • The following data types aren't supported:
    • UUID
    • Fixed(L)
    • TIME
    • Nested STRUCT with required fields

Managed Iceberg table limitations

The following limitations apply specifically to managed Iceberg tables:

  • Vector search isn't supported.
  • Iceberg doesn't support change data feed. As a result, incremental processing isn't supported when reading managed Iceberg tables as a source for:
    • Materialized views and streaming tables
    • Data profiling
    • Online tables
    • Lakebase
    • Data classification
  • Managed Iceberg tables can only be created if predictive optimization is enabled for table maintenance.
  • The following table properties are managed by Unity Catalog and can't be manually set:
    • write.location-provider.impl
    • write.data.path
    • write.metadata.path
    • write.format.default
    • write.delete.format.default
  • The compression codec to change table compression isn't supported. All tables use Zstd by default.
  • Partitioning by expressions (for example, years(), months(), days(), hours(), bucket()) isn't supported.
  • Features not supported in Apache Iceberg also aren't available for managed Iceberg tables. This includes Delta Lake generated columns, Constraints on Databricks, and Collation support for Delta Lake.

Foreign Iceberg table limitations

The following limitations apply specifically to foreign Iceberg tables:

  • Time travel is supported only for Iceberg snapshots that have been previously read in Databricks (that is, snapshots where a SELECT statement was executed).
  • Using bucket transform functions for Iceberg partitioning can degrade query performance when conditional filters are used.
  • Cloud storage tiering products such as Amazon S3 are not integrated with foreign Iceberg tables. Accessing foreign Iceberg tables in Databricks can restore data archived in lower-cost storage tiers.
  • On dedicated access mode clusters, reads and REFRESH FOREIGN TABLE operations on Iceberg tables require ALL PRIVILEGES.