Skip to main content

Work with foreign tables

Foreign tables, sometimes referred to as federated tables, are tables registered using Unity Catalog as part of a foreign catalog. Foreign tables contain data and metadata managed by external systems, with Unity Catalog adding data governance to query these tables.

Databricks supports the following methods for registering foreign tables:

important

All tables in a foreign catalog are foreign tables, and foreign tables must reside in a foreign catalog.

For backwards compatibility with legacy Apache Spark and Databricks workloads, foreign tables in a federated Hive metastore return metadata from Hive metastore including whether the table is a Hive managed table or Hive external table.

Why use a foreign table?

Foreign tables provide flexibility when integrating Databricks with existing data systems or migrating from legacy systems.

Many foreign tables serve as a temporary solution for direct access to data not managed by Databricks, as they provide a quick solution without requiring data migration or code refactoring for upstream ETL workflows. Databricks recommends migrating datasets that drive production workloads or are queried frequently to Unity Catalog managed tables, as managed tables provide the best performance and have many built-in optimizations.

Lakehouse Federation provides a complimentary solution for loading data from external data systems not supported by LakeFlow Connect. Databricks recommends using materialized views to replicate foreign tables to Unity Catalog. See Load data from foreign tables with materialized views.

Create or write to foreign tables

If you have sufficient privileges and your workspace has been configured with an internal federated Hive metastore, you can create or write to foreign tables backed by an internal federated Hive metastore. External federated re:[HMS] and all foreign tables backed by Lakehouse Federation are read-only.

Databricks does not manage the metadata, data, or semantics for writes to foreign tables. Foreign tables might be backed by an ACID-compliant format such as Delta Lake, but foreign tables do not provide the transactional guarantees of Unity Catalog managed tables.

Most Databricks optimizations for query performance, enhanced write speed, data skipping, and metadata-only queries require Delta Lake and Unity Catalog. Databricks recommends comparing read and write query performance between foreign tables and Unity Catalog managed tables using the latest Databricks Runtime version to evaluate latency and cost differences.