Convert to Unity Catalog managed table from external table
This feature is in Public Preview and is only available to participating customers at this time. To participate in the preview, apply by filling out this form.
This page describes using the SET MANAGED
command to convert an external table to a Unity Catalog managed table in Databricks.
SET MANAGED
overview
The SET MANAGED
feature simplifies converting an external table to a Unity Catalog managed table in Databricks. It avoids the complexity of manually running DEEP CLONE
followed by dropping the external table, a process that can cause significant downtime and lead to loss of table history and settings.
The benefits of the SET MANAGED
feature include:
- Minimizing reader and writer downtime.
- Handling concurrent writes during conversion.
- Retaining table history.
- Keeping the same table configurations, including the same name, settings, permissions, and views.
- Ability to roll back a converted managed table to an external table.
Prerequisites
To use the table convention feature, you must have:
-
All readers and writers to the external tables must use name-based access. For example:
SQLSELECT * FROM catalog_name.schema_name.table_name;
Path-based access is not supported and will fail after the table has been converted. For example:
SQLSELECT * FROM delta.`s3://path/to/table`;
-
You must use Databricks Runtime 17.0 or above to run the commands
SET MANAGED
orUNSET MANAGED
. -
Databricks readers and writers must use Databricks Runtime 15.4 LTS or above. If your readers or writers use 14.3 LTS or below, see Alternative option for readers and writers on Databricks Runtime 14.3 LTS or below.
-
External (non-Databricks) clients must support reads to Unity Catalog managed tables. See Read tables with Delta clients.
-
Deactivate the UniForm table property on external tables that have it enabled:
SQLALTER TABLE catalog.schema.my_external_table
UNSET TBLTBLPROPERTIES('delta.universalFormat.enabledFormats'= '');After you have converted your table, you can re-enable UniForm:
SQLALTER TABLE table_name SET TBLPROPERTIES (
'delta.enableIcebergCompatV2' = 'true'
'delta.universalFormat.enabledFormats' = 'iceberg');
To avoid conflicts, cancel any existing OPTIMIZE
command jobs (liquid clustering, compaction, ZORDER
) operating on your table, and do not schedule any jobs while you convert your external tables to managed tables.
Convert from external to managed table
The SET MANAGED
command is available in Databricks Runtime 17.0 or above.
To convert your UC external table to be Unity Catalog managed, run the following command:
ALTER TABLE catalog.schema.my_external_table SET MANAGED;
If your command is interrupted while copying data, you can restart it, and it will continue from where you left off.
Databricks recommends not running multiple SET MANAGED
commands concurrently on the same table, which can lead to an inconsistent table state.
After your table has been converted, you must:
- If you had disabled UniForm on your table, you must enable it again.
- Restart any streaming jobs (read or write) using the external table.
- Ensure that your readers and writers work with the managed table.
Predictive optimization is automatically enabled, except if you manually disabled it. See Check whether predictive optimization is enabled
Check convention
You can confirm that your external table has been converted to a managed table:
DESCRIBE EXTENDED catalog_name.schema_name.table_name
If the table has been converted, the Type
under col_name
shows as MANAGED
under data_type
.
Alternative option for readers and writers on Databricks Runtime 14.3 LTS or below
Databricks recommends that you upgrade all readers and writers to Databricks Runtime 15.4 LTS or above to take advantage of the SET MANAGED
command, including the ability to retain table history.
You can still use the SET MANAGED
command if you have readers or writers with Databricks Runtime 14.3 or below. However, after converting to a managed table, you cannot time travel to historical commits by timestamp. You can only do so by version. If you roll back to an external table in the 14-day window, time-travel to historical commits made before conversion will be re-enabled.
In all cases (no matter what DBR version), rolling back to UC external by timestamp does not work for any commits made to your converted UC managed table, between when you had finished converting and before you tried to roll back.
ALTER TABLE <table_name> DROP FEATURE inCommitTimestamp;
Roll back to an external table
The UNSET MANAGED
command is available in Databricks Runtime 17.0 or above.
After converting an external table to a managed table, you can roll back within 14 days.
If you roll back, commits made to the external location between conversion and rollback are be time-travelable by version but not by timestamp. Seven days after rollback, data in the managed location will be deleted.
To roll back to an external table, run the following command:
ALTER TABLE catalog.schema.my_managed_table UNSET MANAGED
If the rollback command is interrupted, you can rerun it to retry, similar to the SET MANAGED command.
Check rollback
You can confirm that your conversion has been rolled back:
DESCRIBE EXTENDED catalog_name.schema_name.table_name
If the table has been converted, the Type
under col_name
will show as EXTERNAL
under data_type
.
You must also restart your streaming jobs after rolling back to an external table, similar to conversion.
Downtime and data copy times
Downtime can occur when readers or writers access the table during conversion. However, compared to the 'DEEP CLONE' command, the SET MANAGED
command minimizes or eliminates downtime for readers and minimizes downtime for writers. Readers on Databricks Runtime 16.1 or above will experience no downtime during the second data copy step. Readers and writers are unaffected during the initial step when the table data and the delta log are copied.
During the second step, writes to the Unity Catalog external location are blocked, and commits made to the external location during the first data copy will be moved over. This second data copy step will incur downtime for writers and readers in Databricks Runtime 15.4 LTS or below.
After this, readers and writers are shifted to the Unity Catalog managed location, and the new managed table location is registered in Unity Catalog. Readers with Databricks Runtime 16.1 or above will experience an equivalent of no downtime.
Estimated downtime is:
Table size | Recommended cluster size | Time for data copy | Reader and writer downtime |
---|---|---|---|
100 GB or less | 32-core / DBSQL small | ~6min or less | ~1-2min or less |
1 TB | 64-core / DBSQL medium | ~30 min | ~1-2min |
10 TB | 256-core / DBSQL x-large | ~1.5 hrs | ~1-5min |
The estimates assume a throughput rate of 0.5-2GB/CPU core/minute.
The downtime can vary, and the convention's performance depends on factors such as the file size, number of files, and number of commits.
Known limitations
Converting external to managed tables has the following limitations:
-
Path-based and streaming clients: Path-based reads and streaming clients can silently stop updating, so check that these continue to work after conversion. You must also restart any streaming jobs after conversion.
-
Table history constraints after rollback: For readers/writers in Databricks Runtime 15.4 LTS or above, table history for commits made post-conversion but before rollback will be time-travelable by version but not by timestamp.
-
Multiple cloud regions: If the default managed location of your Unity Catalog metastore, catalog, or schema is in a different cloud region from the storage location of the external table being converted, you can incur additional cross-region data transfer costs. The cloud provider imposes these charges outside of Databricks' control.
To check the locations of your schema, catalog, and metastore, you can use the following commands:
SQLDESC SCHEMA EXTENDED <catalog_name>.<schema_name>;
DESC SCHEMA EXTENDED <catalog_name>;
SELECT * FROM system.information_schema.metastores;