What is Apache Iceberg in Databricks?

Apache Iceberg is an open-source table format for analytics workloads. It supports features like schema evolution, time travel, and hidden partitioning. Like Delta Lake, Iceberg creates an abstraction layer that allows ACID transactions on your data in object storage.

Databricks supports Iceberg tables that use the Apache Parquet file format and versions 1, 2, and 3 of the Iceberg specification. Iceberg maintains atomicity and consistency by writing new metadata files for each table change. All Iceberg tables in Databricks follow the open Iceberg table format specification. See the Iceberg table specification.

An Iceberg catalog is the top-level layer of the Iceberg table architecture that returns the current metadata when loading a table. The Iceberg catalog handles operations like creating, dropping, and renaming tables.

Databricks supports Iceberg tables managed by:

Unity Catalog
Foreign catalogs, such as AWS Glue, Hive metastore, or Snowflake Horizon Catalog

Requirements

You must meet the following requirements:

A workspace with Unity Catalog enabled.
Use Databricks Runtime 16.4 LTS or above for both managed Iceberg tables and foreign Iceberg tables.

Create Iceberg tables in Unity Catalog

Iceberg tables that you create in Unity Catalog are managed Iceberg tables. You can create these tables using:

Databricks Runtime or Databricks SQL
External Iceberg-compatible engines that support the Iceberg REST Catalog API, such as Apache Spark, Flink, Trino, or Kafka. See Access Databricks tables from Apache Iceberg clients.

Managed Iceberg tables are fully integrated with Databricks platform features:

Unity Catalog manages lifecycle tasks like snapshot expiration and file compaction on these tables.
Managed Iceberg tables also support liquid clustering, which improves query performance.
Predictive optimization automates operations to reduce storage costs and improve query speed.
Managed Iceberg tables also support materialized views and streaming tables.

Databricks recommends using Iceberg clients 1.9.2 and above to read and write to Unity Catalog.

Read Iceberg tables managed by other catalogs

A foreign Iceberg table is an Iceberg table managed by a catalog outside Unity Catalog. The external catalog stores the table's current metadata. Databricks uses Lakehouse Federation to retrieve metadata and read the table from object storage.

Foreign Iceberg tables are read-only in Databricks and have limited platform support.

Access Iceberg tables using external systems

You can access all Iceberg tables in Unity Catalog using the Iceberg REST Catalog API. This open API supports read and write operations from external Iceberg engines across different languages and platforms. See Access Databricks tables from Apache Iceberg clients.

The REST Catalog supports credential vending, which delivers temporary credentials to external engines for accessing the underlying storage. For more information, see Unity Catalog credential vending for external system access.

warning

Credential vending is not supported on workspaces that use default storage. See Limitations.

Partition evolution

With partition evolution, you can change the partitioning scheme of an existing Apache Iceberg table without rewriting data. New data is written with the updated partition layout and existing data retains its original partition layout. Apache Iceberg tracks the partition specs and applies the correct filter at query time. See partition evolution for Apache Iceberg.

note

Partition evolution is supported on managed Iceberg tables through external Iceberg engines using the Iceberg REST Catalog, but not through Databricks SQL. Expression-based partition transforms such as years() and bucket() are not supported for managed Iceberg tables. See Limitations.

To configure external access, see Access Databricks tables from Apache Iceberg clients.

The following examples show how to use partition evolution with Spark SQL and the Iceberg extension. For Apache Iceberg partition evolution syntax and supported transforms, see Apache Iceberg Spark DDL.

Add a partition field

To add a new partition field to an existing table:

SQL
ALTER TABLE catalog.schema.table ADD PARTITION FIELD column_name;

Drop a partition field

To remove an existing partition field from a table:

SQL
ALTER TABLE catalog.schema.table DROP PARTITION FIELD column_name;

Replace a partition field

To swap one partition field for another without an intermediate repartition:

SQL
ALTER TABLE catalog.schema.table REPLACE PARTITION FIELD old_column WITH new_column;

Limitations

The following limitations apply to Iceberg tables in Databricks and are subject to change:

Iceberg tables support only the Apache Parquet file format.
For Iceberg v2, position deletes and equality-based deletes aren't supported. Instead, Databricks supports Iceberg v3 deletion vectors for row-level deletions.
Branching and tagging aren't supported. Only the main branch is accessible when reading foreign Iceberg tables.
Partitioning:
- Partition evolution is supported on managed Iceberg tables only when interacting from external Iceberg engines.
- Foreign Iceberg tables don't support partition evolution.
- Partitioning by BINARY type isn't supported.
Views aren't accessible from external Iceberg engines.
The following data types aren't supported:
- UUID
- Fixed(L)
- TIME
- Nested STRUCT with required fields
For limitations specific to Iceberg v3, see Limitations.

Managed Iceberg table limitations

The following limitations apply specifically to managed Iceberg tables:

AI Search isn't supported. See Databricks AI Search.

If you use managed Iceberg tables as a source for synced tables for Lakebase, incremental processing with automatic change data feed isn't supported.

Managed Iceberg tables can only be created if predictive optimization is enabled for table maintenance.
The following table properties are managed by Unity Catalog and can't be manually set:
- write.location-provider.impl
- write.data.path
- write.metadata.path
- write.format.default
- write.delete.format.default
The compression codec to change table compression isn't supported. All tables use Zstd by default.
Partitioning by expressions (for example, years(), months(), days(), hours(), bucket()) isn't supported.
Features not supported in Apache Iceberg also aren't available for managed Iceberg tables. This includes Delta Lake generated columns, Constraints on Databricks, and Collation support for Delta Lake.

Foreign Iceberg table limitations

The following limitations apply specifically to foreign Iceberg tables:

Time travel is supported only for Iceberg snapshots that have been previously read in Databricks (that is, snapshots where a SELECT statement was executed).
Using bucket transform functions for Iceberg partitioning can degrade query performance when conditional filters are used.
Cloud storage tiering products such as Amazon S3 are not integrated with foreign Iceberg tables. Accessing foreign Iceberg tables in Databricks can restore data archived in lower-cost storage tiers.
On dedicated access mode clusters, reads and REFRESH FOREIGN TABLE operations on Iceberg tables require ALL PRIVILEGES.

Requirements​

Create Iceberg tables in Unity Catalog​

Read Iceberg tables managed by other catalogs​

Access Iceberg tables using external systems​

Partition evolution​

Add a partition field​

Drop a partition field​

Replace a partition field​

Limitations​

Managed Iceberg table limitations​

Foreign Iceberg table limitations​