Skip to main content

Convert to Unity Catalog managed table from external table

Preview

This feature is in Public Preview and is only available to participating customers at this time. To participate in the preview, apply by filling out this form.

This page describes using the SET MANAGED command to convert an external table to a Unity Catalog managed table in Databricks.

SET MANAGED overview

The SET MANAGED feature simplifies converting an external table to a Unity Catalog managed table in Databricks. It avoids the complexity of manually running DEEP CLONE followed by dropping the external table, a process that can cause significant downtime and lead to loss of table history and settings.

The benefits of the SET MANAGED feature include:

  • Minimizing reader and writer downtime.
  • Handling concurrent writes during conversion.
  • Retaining table history.
  • Keeping the same table configurations, including the same name, settings, permissions, and views.
  • Ability to roll back a converted managed table to an external table.

Prerequisites

To use the table convention feature, you must have:

  • All readers and writers to the external tables must use name-based access. For example:

    SQL
    SELECT * FROM catalog_name.schema_name.table_name;

    Path-based access is not supported and will fail after the table has been converted. For example:

    SQL
    SELECT * FROM delta.`s3://path/to/table`;
  • You must use Databricks Runtime 17.0 or above to run the commands SET MANAGED or UNSET MANAGED.

  • Databricks readers and writers must use Databricks Runtime 15.4 LTS or above. If your readers or writers use 14.3 LTS or below, see Alternative option for readers and writers on Databricks Runtime 14.3 LTS or below.

  • External (non-Databricks) clients must support reads to Unity Catalog managed tables. See Read tables with Delta clients.

  • Deactivate the UniForm table property on external tables that have it enabled:

    SQL
    ALTER TABLE catalog.schema.my_external_table
    UNSET TBLTBLPROPERTIES('delta.universalFormat.enabledFormats'= '');

    After you have converted your table, you can re-enable UniForm:

    SQL
    ALTER TABLE table_name SET TBLPROPERTIES (
    'delta.enableIcebergCompatV2' = 'true'
    'delta.universalFormat.enabledFormats' = 'iceberg');
important

To avoid conflicts, cancel any existing OPTIMIZE command jobs (liquid clustering, compaction, ZORDER) operating on your table, and do not schedule any jobs while you convert your external tables to managed tables.

Convert from external to managed table

important

The SET MANAGED command is available in Databricks Runtime 17.0 or above.

To convert your UC external table to be Unity Catalog managed, run the following command:

SQL
ALTER TABLE catalog.schema.my_external_table SET MANAGED;

If your command is interrupted while copying data, you can restart it, and it will continue from where you left off.

warning

Databricks recommends not running multiple SET MANAGED commands concurrently on the same table, which can lead to an inconsistent table state.

After your table has been converted, you must:

  • If you had disabled UniForm on your table, you must enable it again.
  • Restart any streaming jobs (read or write) using the external table.
  • Ensure that your readers and writers work with the managed table.

Predictive optimization is automatically enabled, except if you manually disabled it. See Check whether predictive optimization is enabled

Check convention

You can confirm that your external table has been converted to a managed table:

SQL
DESCRIBE EXTENDED catalog_name.schema_name.table_name

If the table has been converted, the Type under col_name shows as MANAGED under data_type.

Alternative option for readers and writers on Databricks Runtime 14.3 LTS or below

Databricks recommends that you upgrade all readers and writers to Databricks Runtime 15.4 LTS or above to take advantage of the SET MANAGED command, including the ability to retain table history.

You can still use the SET MANAGED command if you have readers or writers with Databricks Runtime 14.3 or below. However, after converting to a managed table, you cannot time travel to historical commits by timestamp. You can only do so by version. If you roll back to an external table in the 14-day window, time-travel to historical commits made before conversion will be re-enabled.

In all cases (no matter what DBR version), rolling back to UC external by timestamp does not work for any commits made to your converted UC managed table, between when you had finished converting and before you tried to roll back.

SQL
ALTER TABLE <table_name> DROP FEATURE inCommitTimestamp;

Roll back to an external table

important

The UNSET MANAGED command is available in Databricks Runtime 17.0 or above.

After converting an external table to a managed table, you can roll back within 14 days.

If you roll back, commits made to the external location between conversion and rollback are be time-travelable by version but not by timestamp. Seven days after rollback, data in the managed location will be deleted.

To roll back to an external table, run the following command:

SQL
ALTER TABLE catalog.schema.my_managed_table UNSET MANAGED

If the rollback command is interrupted, you can rerun it to retry, similar to the SET MANAGED command.

Check rollback

You can confirm that your conversion has been rolled back:

SQL
DESCRIBE EXTENDED catalog_name.schema_name.table_name

If the table has been converted, the Type under col_name will show as EXTERNAL under data_type.

You must also restart your streaming jobs after rolling back to an external table, similar to conversion.

Downtime and data copy times

Downtime can occur when readers or writers access the table during conversion. However, compared to the 'DEEP CLONE' command, the SET MANAGED command minimizes or eliminates downtime for readers and minimizes downtime for writers. Readers on Databricks Runtime 16.1 or above will experience no downtime during the second data copy step. Readers and writers are unaffected during the initial step when the table data and the delta log are copied.

During the second step, writes to the Unity Catalog external location are blocked, and commits made to the external location during the first data copy will be moved over. This second data copy step will incur downtime for writers and readers in Databricks Runtime 15.4 LTS or below.

After this, readers and writers are shifted to the Unity Catalog managed location, and the new managed table location is registered in Unity Catalog. Readers with Databricks Runtime 16.1 or above will experience an equivalent of no downtime.

Estimated downtime is:

Table size

Recommended cluster size

Time for data copy

Reader and writer downtime

100 GB or less

32-core / DBSQL small

~6min or less

~1-2min or less

1 TB

64-core / DBSQL medium

~30 min

~1-2min

10 TB

256-core / DBSQL x-large

~1.5 hrs

~1-5min

The estimates assume a throughput rate of 0.5-2GB/CPU core/minute.

note

The downtime can vary, and the convention's performance depends on factors such as the file size, number of files, and number of commits.

Known limitations

Converting external to managed tables has the following limitations:

  • Path-based and streaming clients: Path-based reads and streaming clients can silently stop updating, so check that these continue to work after conversion. You must also restart any streaming jobs after conversion.

  • Table history constraints after rollback: For readers/writers in Databricks Runtime 15.4 LTS or above, table history for commits made post-conversion but before rollback will be time-travelable by version but not by timestamp.

  • Multiple cloud regions: If the default managed location of your Unity Catalog metastore, catalog, or schema is in a different cloud region from the storage location of the external table being converted, you can incur additional cross-region data transfer costs. The cloud provider imposes these charges outside of Databricks' control.

    To check the locations of your schema, catalog, and metastore, you can use the following commands:

    SQL
    DESC SCHEMA EXTENDED <catalog_name>.<schema_name>;

    DESC SCHEMA EXTENDED <catalog_name>;

    SELECT * FROM system.information_schema.metastores;