Work with external tables
In Unity Catalog, an external table stores its data files in cloud object storage within your cloud tenant. Unity Catalog continues to manage the table's metadata, ensuring full data governance across all queries. However, it does not manage the data's lifecycle, optimization, storage location, or layout.
When you define a Unity Catalog external table, you must specify a storage location. This location is an external location registered in Unity Catalog. When you drop an external table, Unity Catalog removes the table metadata but does not delete the underlying data files.
This article focuses on Unity Catalog external tables. External tables in the legacy Hive metastore have different behaviors. See Database objects in the legacy Hive metastore.
When to use external tables
Databricks recommends using external tables for the following use cases:
- You need to register a table backed by existing data that is not compatible with Unity Catalog managed tables.
- You also require direct access to the data from non-Databricks clients that do not support other external access patterns. Unity Catalog privileges are not enforced when users access data files from external systems. See Access Databricks data using external systems.
In most cases, Databricks recommends using Unity Catalog managed tables to take advantage of automatic table optimization, faster query performance, and reduced costs. To migrate external tables to managed tables, see Convert an external table to a managed Unity Catalog table.
If you update external table metadata using a non-Databricks client or using path-based access from within Databricks, that metadata does not automatically sync state with Unity Catalog. Databricks recommends against such metadata updates, but if you do perform one, you must run MSCK REPAIR TABLE <table-name> SYNC METADATA to bring the schema in Unity Catalog up to date. See REPAIR TABLE.
File formats for external tables
External tables can use the following file formats:
- DELTA
- CSV
- JSON
- AVRO
- PARQUET
- ORC
- TEXT
Create an external table
You can create external tables using SQL commands or DataFrame write operations.
Before you begin
Before you create an external table, you must first configure an external location that grants access to your cloud storage.
Databricks recommends using the AWS CloudFormation Quickstart template to create external locations. This approach automatically configures both the storage credential and external location, significantly simplifying the setup process. For detailed instructions, see Create a storage credential and external location for S3 using AWS CloudFormation.
To create an external table, you must meet the following permission requirements:
- The
CREATE EXTERNAL TABLEprivilege on an external location that grants access to theLOCATIONaccessed by the external table. - The
USE CATALOGpermission on the table's parent catalog. - The
USE SCHEMApermission on the table's parent schema. - The
CREATE TABLEpermission on the table's parent schema.
When an S3 external location is associated with multiple metastores, avoid granting write access to tables that use that S3 location because writes from different metastores to the same external table can cause consistency issues. However, reading from the same S3 external location across multiple metastores is safe.
SQL command examples
Use one of the following command examples in a notebook or the SQL query editor to create an external table.
In the following examples, replace the placeholder values:
<catalog>: The name of the catalog that will contain the table.<schema>: The name of the schema that will contain the table.<table-name>: A name for the table.<column-specification>: The name and data type for each column.<bucket-path>: The path to the cloud storage bucket where the table will be created.<table-directory>: A directory where the table will be created. Use a unique directory for each table.
Table paths must contain only standard ASCII characters (letters A–Z, a–z, digits 0–9, and common symbols like /, _, -).
CREATE TABLE <catalog>.<schema>.<table-name>
(
<column-specification>
)
LOCATION 's3://<bucket-path>/<table-directory>';
For more information about table creation parameters, see CREATE TABLE.
DataFrame write operations
You can also create external tables from query results or DataFrame write operations. Use the LOCATION clause to specify the external storage path when creating tables with DataFrames.
The following SQL syntax options work with DataFrame operations:
Drop an external table
To drop a table you must be its owner or have the MANAGE privilege on the table. To drop an external table, run the following SQL command:
DROP TABLE IF EXISTS catalog_name.schema_name.table_name;
Unity Catalog does not delete the underlying data in cloud storage when you drop an external table. You must directly delete the underlying data files if you need to remove data associated with the table.
Example notebook: Create external tables
You can use the following example notebook to create a catalog, schema, and external table, and to manage permissions on them.
The following notebook demonstrates how to create and manage external tables on AWS. For a simplified setup experience, Databricks recommends using the AWS CloudFormation Quickstart to create the external location before running this notebook.