Enable external data access to streaming tables and materialized views

Preview

If you have enabled external data access to Unity Catalog, then you can additionally add external data access to your pipeline datasets. This enables external Delta and Iceberg clients to access your datasets through the Unity Catalog and Iceberg catalog REST APIs, without requiring a full data copy.

External data access for pipeline datasets works for Lakeflow pipelines.

Capabilities

Using external data access for pipeline datasets exposes the same data available in Databricks, without creating a duplicate of the data. This gives the following characteristics for performance and functionality:

No data copy required: External access is enabled without duplicating the full dataset.
External access via APIs: Read materialized views and streaming tables using Delta Lake or Iceberg APIs.
Read-after-write consistency: External readers can access up-to-date data after an update to the dataset, ensuring no staleness. Updates are available immediately upon refresh.
Single table object: Datasets appear externally as managed tables with the same name as the source dataset within Unity Catalog APIs.
Low cost: Because the full dataset is not copied, the overhead for providing external access is low.

Requirements

The requirements for your datasets are:

External access must be enabled on the schema: Your workspace must be enrolled in the External data access for pipeline datasets public preview, and it must be enabled for the schema with your datasets. See Enable external data access to Unity Catalog.
Unity Catalog: Your streaming tables and materialized views must be using Unity Catalog.
Databricks Runtime version: You must be using Databricks Runtime 17.3 and above.

The requirements for your clients are:

Delta API version: The client must support Delta Lake APIs 4.0.0 or above, including deletion vectors, and must use the Unity Catalog catalog APIs for access.
Iceberg API version: Alternatively, the client can access using Iceberg catalog APIs that support the Iceberg v3 specification.
Unity Catalog privileges: The principal reading the datasets externally must have the EXTERNAL USE SCHEMA privilege on the schema, and SELECT privilege on the table.

note

If your client does not support these requirements, you can also use compatibility mode, which supports all Delta and Iceberg clients, but requires creating a full copy of the dataset.

How to enable access for a dataset

There are three steps to enable external access for a dataset.

In your dataset definition, add the following TBLPROPERTIES. This is only required for Iceberg v3 readers. If you only have Delta readers, you can skip this step.

Property	Use
`'delta.columnMapping.mode' = 'name'`	Column mapping is required for Iceberg.
`'delta.universalFormat.enabledFormats' = 'iceberg'`	Enable UniForm for Iceberg.
`'delta.enableIcebergCompatV3' = 'true'`	Use Iceberg V3 for UniForm.
`'delta.enableChangeDataFeed' = 'false'`	Change data feed is not compatible with external access, so this must be `false`.

Property	Use
`'delta.columnMapping.mode' = 'name'`	Column mapping is required for Iceberg.
`'delta.universalFormat.enabledFormats' = 'iceberg'`	Enable UniForm for Iceberg.
`'delta.enableIcebergCompatV3' = 'true'`	Use Iceberg V3 for UniForm.
`'delta.enableChangeDataFeed' = 'false'`	Change data feed is not compatible with external access, so this must be `false`.

For example, you can update the definition of a materialized view in Lakeflow pipelines by adding the following TBLPROPERTIES to your query:

SQL
CREATE OR REFRESH MATERIALIZED VIEW view_name
  TBLPROPERTIES(
   ...
   'delta.columnMapping.mode' = 'name',
   'delta.enableIcebergCompatV3' = 'true',
   'delta.universalFormat.enabledFormats' = 'iceberg',
   'delta.enableChangeDataFeed' = 'false')
...

To see the properties of your dataset, you can use the DESCRIBE EXTENDED SQL statement.

Apply the Iceberg properties to the pipeline. This is only required for Iceberg v3 readers. If you only have Delta readers, you can skip this step.
- Triggered pipelines: Run the pipeline once.
- Continuous pipelines: Stop and restart the pipeline.
In your pipeline configuration, set pipelines.externalMetadata.enabled to true.
- Pipeline settings UI
- Pipeline configuration JSON
1. Open your pipeline and click Settings.
2. Under Configuration, add a key-value pair: Key pipelines.externalMetadata.enabled, Value true.
3. Click Save.
In the configuration section of your pipeline JSON, add:
JSON
{ "configuration": { "pipelines.externalMetadata.enabled": "true" } }
After saving the configuration, run or restart the pipeline to apply the changes:
- Triggered pipelines: Run the pipeline once.
- Continuous pipelines: Stop and restart the pipeline.

Reading data from external clients

The following sections describe how to read your dataset from different clients and environments.

Use Unity REST API with the Spark Delta Reader

Use Apache Spark™ version 4.0 or later. You can download from https://spark.apache.org/downloads.html.

Based on your cloud provider, run the following command to start a Spark SQL shell with Delta 4.0 and Unity Catalog.

AWS
Azure
GCP

Shell
bin/spark-sql \
    --packages org.apache.spark:spark-hadoop-cloud_2.13:4.0.0,io.unitycatalog:unitycatalog-spark_2.13:0.3.1 \
    --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
    --conf spark.sql.catalog.spark_catalog=io.unitycatalog.spark.UCSingleCatalog \
    --conf spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
    --conf spark.sql.catalog.<uc-catalog-name>=io.unitycatalog.spark.UCSingleCatalog \
    --conf spark.sql.catalog.<uc-catalog-name>.uri=<workspace_url> \
    --conf spark.sql.catalog.<uc-catalog-name>.token=<PAT> \
    --conf spark.sql.defaultCatalog=<uc-catalog-name>

Shell
bin/spark-sql \
    --packages org.apache.hadoop:hadoop-azure:3.3.6,io.unitycatalog:unitycatalog-spark_2.13:0.3.1 \
    --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
    --conf spark.sql.catalog.spark_catalog=io.unitycatalog.spark.UCSingleCatalog \
    --conf spark.sql.catalog.<uc-catalog-name>=io.unitycatalog.spark.UCSingleCatalog \
    --conf spark.sql.catalog.<uc-catalog-name>.uri=<workspace_url> \
    --conf spark.sql.catalog.<uc-catalog-name>.token=<PAT> \
    --conf spark.sql.defaultCatalog=<uc-catalog-name>

Shell
bin/spark-sql \
    --packages io.unitycatalog:unitycatalog-spark_2.13:0.3.1  \
    --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
    --conf spark.sql.catalog.spark_catalog=io.unitycatalog.spark.UCSingleCatalog \
    --conf spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem \
    --conf spark.hadoop.fs.AbstractFileSystem.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS \
    --conf spark.sql.catalog.<uc-catalog-name>=io.unitycatalog.spark.UCSingleCatalog \
    --conf spark.sql.catalog.<uc-catalog-name>.uri=<workspace_url> \
    --conf spark.sql.catalog.<uc-catalog-name>.token=<PAT> \
    --conf spark.sql.defaultCatalog=<uc-catalog-name>

From the SQL shell, you can now access your dataset with Spark SQL. For example:
Shell
```
spark-sql ()> SELECT * FROM <uc-catalog>.<uc-schema>.<uc-table-name>;
```

Use the Snowflake Iceberg Reader

Within Snowflake, you can use the Iceberg Reader. This requires Iceberg v3 support in Snowflake.

Set up the Iceberg REST catalog in Apache Spark.

Shell
bin/spark-shell \
  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.8.0,org.apache.iceberg:iceberg-aws-bundle:1.8.0 \
  --conf "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" \
  --conf spark.sql.catalog.<uc-catalog-name>=org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.<uc-catalog-name>.type=rest \
  --conf spark.sql.catalog.<uc-catalog-name>.uri=<workspace-url>/api/2.1/unity-catalog/iceberg-rest \
  --conf spark.sql.catalog.<uc-catalog-name>.token=<PAT> \
  --conf spark.sql.catalog.<uc-catalog-name>.warehouse=<uc-catalog-name>

Set up the Iceberg REST catalog in Snowflake.

SQL
CREATE OR REPLACE CATALOG INTEGRATION my_uc_int
  CATALOG_SOURCE = ICEBERG_REST
  TABLE_FORMAT = ICEBERG
  CATALOG_NAMESPACE = '<uc-schema-name>'
  REST_CONFIG = (
    CATALOG_URI = '<workspace-url>/api/2.1/unity-catalog/iceberg-rest'
    CATALOG_NAME = '<uc-catalog-name>'
    ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
  )
  REST_AUTHENTICATION = (
    TYPE = BEARER
    BEARER_TOKEN = '<PAT>'
  )
  ENABLED = TRUE;

CREATE OR REPLACE ICEBERG TABLE my_table
  CATALOG = 'my_uc_int'
  CATALOG_TABLE_NAME = '<uc-table-name>';

Access your dataset from Spark SQL.

Shell
spark-sql ()> SELECT * FROM <uc-catalog>.<uc-schema>.<uc-table-name>;

Use the Iceberg REST catalog with Spark Iceberg reader

Use Apache Spark™ version 4.0 or later. You can download from https://spark.apache.org/downloads.html.

In AWS, run the following command to start a Spark SQL shell with Iceberg v3.

Shell
bin/spark-sql \
  --packages org.apache.iceberg:iceberg-spark-runtime-4.0_2.13:1.10.0,org.apache.iceberg:iceberg-aws-bundle:1.10.0 \
  --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
  --conf spark.sql.catalog.<uc-catalog-name>=org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.<uc-catalog-name>.io-impl=org.apache.iceberg.aws.s3.S3FileIO \
  --conf spark.sql.catalog.<uc-catalog-name>.type=rest \
  --conf spark.sql.catalog.<uc-catalog-name>.uri=<workspace_url>/api/2.1/unity-catalog/iceberg-rest \
  --conf spark.sql.catalog.<uc-catalog-name>.token='<PAT>' \
  --conf spark.sql.catalog.<uc-catalog-name>.warehouse=<uc-catalog-name> \
  --conf spark.sql.iceberg.vectorization.enabled=false

Access your dataset from Spark SQL.

Shell
spark-sql ()> SELECT * FROM <uc-catalog>.<uc-schema>.<uc-table-name>;

Migrate from compatibility mode

If you are currently sharing a dataset using compatibility mode, you can migrate to using external data access.

Enable this feature following the steps in How to enable access for a dataset.
Disable compatibility mode. See Disable Compatibility Mode

Limitations

The following are known limitations with external data access for streaming tables and materialized views.

External Writes: External writes to pipeline datasets are not supported.
Path-Based Access: External readers that require path-based access (reading directly through a storage location instead of the UC API interface) are not supported. To support path-based access, you can use compatibility mode, which does support path-based access, but requires a full copy of the dataset.
Security Features: Supporting row-level security or column level masking from external reads is not supported.
Time Travel or CDF: Supporting time travel or change data feed (CDF) via this feature is not supported. CDF must be disabled when UniForm Iceberg is enabled.
Catalog commits (beta): Catalog commits are not compatible with external data access. To use external data access on a streaming table, you must first disable catalog commits. Catalog commits are not available for materialized views.
Ingestion pipelines: Streaming tables created with Lakeflow Connect do not support enabling Iceberg table properties, and are only available with Delta readers.
Fabric: Reading from Microsoft Fabric is not supported.
Snowflake Iceberg reader: You must be using the Iceberg v3 reader in Snowflake to read pipeline datasets.
Standalone MVs and STs: This feature is only supported for materialized views and streaming tables managed by a pipeline. Standalone materialized views and streaming tables are not supported. Contact your Databricks account team if you need external access for standalone materialized views and streaming tables.

Capabilities​

Requirements​

How to enable access for a dataset​

Reading data from external clients​

Use Unity REST API with the Spark Delta Reader​

Use the Snowflake Iceberg Reader​

Use the Iceberg REST catalog with Spark Iceberg reader​

Migrate from compatibility mode​

Limitations​