Unity Catalog managed tables in Databricks for Delta Lake and Apache Iceberg
Unity Catalog managed tables is generally available for Delta Lake tables. For Apache Iceberg tables, this feature is in Public Preview and available in Databricks Runtime 16.4 LTS and above.
This page describes Unity Catalog managed tables in Delta Lake and Apache Iceberg, the default and recommended table type in Databricks. These tables are fully governed and optimized by Unity Catalog, offering performance, operational advantages, and lower storage and compute costs compared to external and foreign tables, because managed tables learn from your read and write pattern. Unity Catalog manages all read, write, storage, and optimization responsibilities for managed tables.
Data files for managed tables are stored in the schema or catalog containing them. See Specify a managed storage location in Unity Catalog.
Databricks recommends using managed tables to take advantage of:
- Reduced storage and compute costs.
- Faster query performance across all client types.
- Automatic table maintenance and optimization.
- Secure access for non-Databricks clients via open APIs.
- Support for Delta Lake and Iceberg formats.
- Automatic upgrades to the latest platform features.
Managed tables support interoperability by allowing access from Delta Lake and Iceberg clients. Through open APIs and credential vending, Unity Catalog enables external engines such as Trino, DuckDB, Apache Spark, Daft, and Iceberg REST catalog-integrated engines like Dremio to access managed tables. Delta Sharing, an open source protocol, enables secure, governed data sharing with external partners and platforms.
You can work with managed tables across all languages and products supported in Databricks. You need certain privileges to create, update, delete, or query managed tables. See Manage privileges in Unity Catalog.
All reads and writes to managed tables must use table names and catalog and schema names where they exist (for example, catalog_name.schema_name.table_name
).
This page focuses on Unity Catalog managed tables. For managed tables in the legacy Hive metastore, see Database objects in the legacy Hive metastore.
Why use Unity Catalog managed tables?
Unity Catalog managed tables automatically optimize storage costs and query speeds using AI-driven technologies like automatic clustering, file size compaction, and intelligent statistics collection. These tables simplify data management with features like automatic vacuuming and metadata caching, while ensuring interoperability with Delta and Iceberg third-party tools.
Feature | Benefits |
---|---|
Predictive optimization | Optimizes your data layout and compute using AI, by automatically sizing the compute, binpacking jobs for maximum efficiency, and logging the results so that you can observe what happened. Predictive optimization automatically runs:
This feature reduces compute and storage costs by saving storage size. See Predictive optimization for Unity Catalog managed tables. |
Automatic liquid clustering | Data is automatically clustered most efficiently, based on table query access patterns, which increases query speeds for all clients (Databricks and non-Databricks). See Automatic liquid clustering. |
Automatic statistics | Statistics collection boosts query performance by implementing efficient data skipping and join strategies. Automatically gathering essential statistics, such as the minimum and maximum values for columns, Databricks can determine which files are irrelevant and exclude them during query execution. This reduces computational overhead. Unity Catalog external tables that generate statistics based on the first 32 columns by default, while Unity Catalog managed tables dynamically collect statistics for columns most relevant to the query workloads. |
Metadata caching | In-memory caching of transaction metadata to minimize requests to the transaction log stored in the cloud. This feature enhances query performance. |
File size optimization | Databricks automatically compacts file sizes for you to be just the right size, by learning from data collected from thousands of production deployments behind the scenes. Databricks will automatically determine the target file size and adjust writes to better conform to those file sizes, which can help improve query performance and save storage costs. See Configure Delta Lake to control data file size. |
| If you DROP a managed table, the data will be deleted automatically in cloud storage after 7 days, reducing storage costs. For external tables, you must manually go to your storage bucket and delete the files. |
Create a managed table
Use the following SQL syntax to create an empty managed table using SQL. Replace the placeholder values:
<catalog-name>
: The name of the catalog that will contain the table.<schema-name>
: The schema's name containing the table.<table-name>
: A name for the table.<column-specification>
: Each column's name and data type.
-- Create a managed Delta table
CREATE TABLE <catalog-name>.<schema-name>.<table-name>
(
<column-specification>
);
-- Create a managed Iceberg table
CREATE TABLE <catalog-name>.<schema-name>.<table-name>
(
<column-specification>
)
USING iceberg;
To maintain performance on reads and writes, Databricks periodically runs operations to optimize managed Iceberg table metadata. This task is performed using serverless compute, which has MODIFY
permissions on the Iceberg table. This operation only writes to the table's metadata, and the compute only maintains permissions to the table for the duration of the job.
To create an Iceberg table, explicitly specify USING iceberg
. Otherwise, Databricks creates a Delta Lake table by default.
You can create managed tables from query results or DataFrame write operations. The following articles demonstrate some of the many patterns you can use to create a managed table on Databricks:
Required permissions
To create a managed table, ensure you have:
USE SCHEMA
permission on the table's parent schema.USE CATALOG
permission on the table's parent catalog.CREATE TABLE
permission on the table's parent schema.
Drop a managed table
You must be the table owner or have the MANAGE
privilege to drop a managed table. To drop a managed table, run the following SQL command:
DROP TABLE IF EXISTS catalog_name.schema_name.table_name;
Unity Catalog supports the UNDROP TABLE
command to recover dropped managed tables for 7 days. After 7 days, Databricks marks the underlying data for deletion from your cloud tenant and removes files during automated table maintenance. See UNDROP.