Skip to main content

Access Databricks tables from Apache Iceberg clients

Preview

Unity Catalog Apache Iceberg REST Catalog API is in Public Preview in Databricks Runtime 16.4 LTS and above. This endpoint is recommended for reading and writing to tables from Iceberg clients.

Unity Catalog also has a read-only Iceberg REST Catalog API endpoint. This is a legacy endpoint. See Read Databricks tables from Apache Iceberg clients (legacy).

The Apache Iceberg REST catalog lets supported clients, such as Apache Spark, Apache Flink, and Trino, read from and write to Unity Catalog–registered Iceberg tables on Databricks.

For a full list of supported integrations, see Unity Catalog integrations.

Use the Unity Catalog Iceberg catalog endpoint

Unity Catalog provides an implementation of the Iceberg REST catalog API specification.

Configure access using the endpoint /api/2.1/unity-catalog/iceberg-rest. See the Iceberg REST API spec for details on using this REST API.

important

The workspace URL used for the Iceberg REST catalog endpoint must include the workspace ID. Without the workspace ID, API requests may return a 303 redirect to a login page instead of the expected response.

To find your workspace URL and workspace ID, see Workspace instance names, URLs, and IDs.

note

Databricks has introduced credential vending for some Iceberg reader clients. Databricks recommends using credential vending to control access to cloud storage locations for supported systems. See Unity Catalog credential vending for external system access.

If credential vending is unsupported for your client, you must configure access from the client to the storage location containing the files and metadata for the Delta or Iceberg table. Refer to documentation for your Iceberg client for configuration details.

Requirements

Databricks supports Iceberg REST catalog access to tables as part of Unity Catalog. You must have Unity Catalog enabled in your workspace to use these endpoints. The following table types are accessible via the Iceberg REST Catalog:

Topic

Read

Write

Managed Iceberg

Yes

Yes

Foreign Iceberg

Yes

No

Managed Delta (with Iceberg reads enabled)

Yes

No

External Delta (with Iceberg read enabled)

Yes

No

Foreign Iceberg tables are not automatically refreshed when reading via the Iceberg REST Catalog API. To refresh, you must run REFRESH FOREIGN TABLE to read the latest snapshot. Credential vending on Foreign Iceberg tables is not supported.

note

You must configure Delta tables to be accessible via the Iceberg REST Catalog API. See Read Delta tables with Iceberg clients.

You must complete the following configuration steps to configure access to read or write to Databricks tables from Iceberg clients using the Iceberg REST catalog:

note

The Iceberg specification does not allow duplicate data files in a single table snapshot. To prevent this occurrence, when detected, Unity Catalog blocks external engines from committing duplicate data files to the table.

Use Iceberg tables with Apache Spark

The following is an example how to configure Apache Spark to access Databricks tables via the Iceberg REST Catalog API using OAuth authentication:

"spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions",

# Configuration for accessing tables in Unity Catalog
"spark.sql.catalog.<spark-catalog-name>": "org.apache.iceberg.spark.SparkCatalog",
"spark.sql.catalog.<spark-catalog-name>.type": "rest",
"spark.sql.catalog.<spark-catalog-name>.rest.auth.type": "oauth2",
"spark.sql.catalog.<spark-catalog-name>.uri": "<>/api/2.1/unity-catalog/iceberg-rest",
"spark.sql.catalog.<spark-catalog-name>.oauth2-server-uri": "<workspace-url>/oidc/v1/token",
"spark.sql.catalog.<spark-catalog-name>.credential":"<oauth_client_id>:<oauth_client_secret>",
"spark.sql.catalog.<spark-catalog-name>.warehouse":"<uc-catalog-name>"
"spark.sql.catalog.<spark-catalog-name>.scope":"all-apis"

Replace the following variables:

  • <uc-catalog-name>: The name of the catalog in Unity Catalog that contains your tables.
  • <spark-catalog-name>: The name you want to assign the catalog in your Spark session.
  • <oauth_client_id>: OAuth client ID for the authenticating principal.
  • <oauth_client_secret>: OAuth client secret for the authenticating principal.
  • <workspace-url>: The Databricks workspace URL, including the workspace ID. For example, cust-success.cloud.databricks.com/?o=6280049833385130.

With these configurations, you can query tables in Unity Catalog using Apache Spark. To access tables across multiple catalogs, you must configure each catalog separately.

When you query tables in Unity Catalog using Spark configurations, keep the following in mind:

  • You need "spark.sql.extensions": "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" only if you are running Iceberg-specific stored procedures.

  • Databricks uses cloud object storage for all tables. You must add the iceberg-spark-runtime JAR as Spark packages:

    • AWS: org.apache.iceberg:iceberg-aws-bundle:<iceberg-version>
    • Azure: org.apache.iceberg:iceberg-azure-bundle:<iceberg-version>
    • GCP: org.apache.iceberg:iceberg-gcp-bundle:<iceberg-version>

    For details, see the documentation for the Iceberg AWS integration for Spark.

    note

    These configurations are not required when accessing Iceberg tables from Databricks. Loading external Iceberg JARs onto Databricks clusters is not supported.

Access Databricks tables with Snowflake

Snowflake provides two options for accessing tables through the Iceberg REST catalog: via Snowflake's catalog-linked databases, or via external tables.

For both options, first configure a Snowflake catalog integration:

SQL
CREATE OR REPLACE CATALOG INTEGRATION <catalog-integration-name>
CATALOG_SOURCE = ICEBERG_REST
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = '<uc-schema-name>'
REST_CONFIG = (
CATALOG_URI = '<workspace-url>/api/2.1/unity-catalog/iceberg-rest',
WAREHOUSE = '<uc-catalog-name>'
ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
)
REST_AUTHENTICATION = (
TYPE = BEARER
BEARER_TOKEN = '<token>'
)
ENABLED = TRUE;

Replace the following variables:

  • <catalog-integration-name>: The name you want to assign the catalog registered to Snowflake.
  • <uc-schema-name>: The name of the schema in Unity Catalog you need to access.
  • <uc-catalog-name>: The name of the catalog in Unity Catalog you need to access.
  • <workspace-url>: The Databricks workspace URL, including the workspace ID. For example, https://cust-success.cloud.databricks.com/?o=6280049833385130 or https://adb-1234567890123456.12.azuredatabricks.net.
  • <token>: PAT token for the principal configuring the integration.

Catalog-linked databases

Snowflake's catalog-linked databases automatically sync with Unity Catalog to detect schemas and Iceberg tables. This eliminates the need for manual metadata refresh.

After configuring a Snowflake catalog integration, refer to the Snowflake documentation to create a catalog-linked database to access your tables.

important

Attempting to write from Snowflake to read-only Databricks tables can result in errors. Refer to the Snowflake documentation for supported operations.

External tables

Alternatively, you can create external tables after creating a Snowflake catalog integration. This approach requires manually refreshing metadata to see updates.

SQL
CREATE OR REPLACE ICEBERG TABLE my_table
CATALOG = '<catalog-integration-name>'
CATALOG_TABLE_NAME = '<uc-table-name>';

Use Databricks tables with PyIceberg

To use PyIceberg to access Databricks tables, you must install PyIceberg with the required dependencies. PyIceberg requires pyarrow for table operations such as reading data and inspecting table metadata. Install PyIceberg with the pyarrow extra:

Bash
pip install "pyiceberg[pyarrow]"
note

If you do not install pyarrow, operations such as describing or reading tables fail. For the full list of optional dependencies, see the PyIceberg documentation.

The following is an example of the configuration settings to allow PyIceberg to access Databricks tables by connecting to the Iceberg REST Catalog in Unity Catalog:

YAML
catalog:
unity_catalog:
uri: https://<workspace-url>/api/2.1/unity-catalog/iceberg-rest
warehouse: <uc-catalog-name>
token: <token>

Replace the following variables:

  • <workspace-url>: The Databricks workspace URL, including the workspace ID. For example, cust-success.cloud.databricks.com/?o=6280049833385130.
  • <uc-catalog-name>: The name of the catalog in Unity Catalog you need to access.
  • <token>: PAT token for the principal configuring the integration.

See the documentation for the PyIceberg REST catalog configuration.

REST API curl example

You can also use a REST API call like the one in this curl example to load a table:

Bash
curl -X GET -H "Authorization: Bearer $OAUTH_TOKEN" -H "Accept: application/json" \
https://<workspace-instance>/api/2.1/unity-catalog/iceberg-rest/v1/catalogs/<uc_catalog_name>/namespaces/<uc_schema_name>/tables/<uc_table_name>

You should then receive a response like this:

{
"metadata-location": "s3://bucket/path/to/iceberg/table/metadata/file",
"metadata": <iceberg-table-metadata-json>,
"config": {
"expires-at-ms": "<epoch-ts-in-millis>",
"s3.access-key-id": "<temporary-s3-access-key-id>",
"s3.session-token":"<temporary-s3-session-token>",
"s3.secret-access-key":"<temporary-secret-access-key>",
"client.region":"<aws-bucket-region-for-metadata-location>"
}
}
note

The expires-at-ms field in the response indicates the expiration time of the credentials and has a default expiry time of one hour. For better performance, have the client cache the credentials until the expiration time before requesting a new one.