This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See What is Unity Catalog?.
August 25, 2022
Unity Catalog is now generally available on Databricks.
This article describes Unity Catalog as of the date of its GA release. It focuses primarily on the features and updates added to Unity Catalog since the Public Preview. For current information about Unity Catalog, see What is Unity Catalog?. For release notes that describe updates to Unity Catalog since GA, see Databricks platform release notes and Databricks Runtime release notes versions and compatibility.
As of August 25, 2022
Your Databricks account can have only one metastore per region
A metastore can have up to 1000 catalogs.
A catalog can have up to 10,000 schemas.
A schema can have up to 10,000 tables.
For current Unity Catalog quotas, see Resource quotas.
As of August 25, 2022:
All managed Unity Catalog tables store data with Delta Lake
External Unity Catalog tables and external locations support Delta Lake, JSON, CSV, Avro, Parquet, ORC, and text data.
For current Unity Catalog supported table formats, see Supported data file formats.
Use the Databricks account console UI to:
Manage the metastore lifecycle (create, update, delete, and view Unity Catalog-managed metastores)
Assign and remove metastores for workspaces
Unity Catalog requires clusters that run Databricks Runtime 11.1 or above. Unity Catalog is supported by default on all SQL warehouse compute versions.
Earlier versions of Databricks Runtime supported preview versions of Unity Catalog. Clusters running on earlier versions of Databricks Runtime do not provide support for all Unity Catalog GA features and functionality.
Unity Catalog requires one of the following access modes when you create a new cluster:
Languages: SQL or Python
A secure cluster that can be shared by multiple users. Cluster users are fully isolated so that they cannot see each other’s data and credentials.
Languages: SQL, Scala, Python, R
A secure cluster that can be used exclusively by a specified single user.
For more information about cluster access modes, see Access modes.
For information about updated Unity Catalog functionality in later Databricks Runtime versions, see the release notes for those versions.
information_schema is fully supported for Unity Catalog data assets. Each metastore includes a catalog referred to as
system that includes a metastore scoped
information_schema. See Information schema. You can use
information_schema to answer questions like the following:
“Count the number of tables per catalog”
SELECT table_catalog, count(table_name) FROM system.information_schema.tables GROUP BY 1 ORDER by 2 DESC
“Show me all of the tables that have been altered in the last 24 hours“
SELECT table_name, table_owner, created_by, last_altered, last_altered_by, table_catalog FROM system.information_schema.tables WHERE datediff(now(), last_altered) < 1
Structured Streaming workloads are now supported with Unity Catalog. For details and limitations, see Unity Catalog limitations.
User-defined SQL functions are now fully supported on Unity Catalog. For information about how to create and use SQL UDFs, see CREATE FUNCTION (SQL and Python).
Standard data definition and data definition language commands are now supported in Spark SQL for external locations, including the following:
CREATE | DROP | ALTER | DESCRIBE | SHOW EXTERNAL LOCATION
You can also manage and view permissions with
SHOW for external locations with SQL. See External locations.
CREATE EXTERNAL LOCATION <your-location-name> URL `<your-location-path>' WITH (CREDENTIAL <your-credential-name>); GRANT READ FILES, WRITE FILES, CREATE EXTERNAL TABLE ON EXTERNAL LOCATION `<your-location-name>` TO `finance`;
As of August 25, 2022, Unity Catalog had the following limitations. For current limitations, see Unity Catalog limitations.
Scala, R, and workloads using the Machine Learning Runtime are supported only on clusters using the single user access mode. Workloads in these languages do not support the use of dynamic views for row-level or column-level security.
Shallow clones are not supported when using Unity Catalog as the source or target of the clone.
Bucketing is not supported for Unity Catalog tables. If you run commands that try to create a bucketed table in Unity Catalog, it will throw an exception.
Writing to the same path or Delta Lake table from workspaces in multiple regions can lead to unreliable performance if some clusters access Unity Catalog and others do not.
Overwrite mode for DataFrame write operations into Unity Catalog is supported only for Delta tables, not for other file formats. The user must have the
CREATEprivilege on the parent schema and must be the owner of the existing object.
Streaming currently has the following limitations:
It is not supported in clusters using shared access mode. For streaming workloads, you must use single user access mode.
Asynchronous checkpointing is not yet supported.
On Databricks Runtime version 11.2 and below, streaming queries that last more than 30 days on all-purpose or jobs clusters will throw an exception. For long-running streaming queries, configure automatic job retries or use Databricks Runtime 11.3 and above.
Referencing Unity Catalog tables from Delta Live Tables pipelines is currently not supported.
Groups previously created in a workspace cannot be used in Unity Catalog GRANT statements. This is to ensure a consistent view of groups that can span across workspaces. To use groups in GRANT statements, create your groups in the account console and update any automation for principal or group management (such as SCIM, Okta and Microsoft Entra ID (formerly Azure Active Directory) connectors, and Terraform) to reference account endpoints instead of workspace endpoints.
Unity Catalog requires the E2 version of the Databricks platform. All new Databricks accounts and most existing accounts are on E2.
As of August 25, 2022, Unity Catalog was available in the following regions. For the list of currently supported regions, see Databricks clouds and regions.