Unity Catalog General Availability

August 25, 2022

Unity Catalog is generally available for all workloads

For supported regions, see the limitations section of these release notes.

For more information about Unity Catalog, see Overview of Unity Catalog.

A Single Metastore per available region is supported

  • A metastore can have up to 1000 catalogs.

  • A catalog can have up to 10,000 schemas.

  • A schema can have up to 10,000 tables.

For full Unity Catalog quotas, see Resource quotas for Unity Catalog.

Unity catalog supports the following storage formats

  • All managed Unity Catalog tables store data with Delta Lake

  • External Unity Catalog tables and external locations support Delta Lake, JSON, CSV, Avro, Parquet, ORC, and text data.

Manage Unity Catalog resources from the accounts console

Use the Databricks accounts console UI to:

Supported cluster types

Unity Catalog has two supported access modes when you create a new cluster:

  • Shared

    • Languages: SQL or Python

    • A secure cluster that can be shared by multiple users. Cluster users are fully isolated so that they cannot see each other’s data and credentials.

  • Single user

    • Lanaguage: SQL, Scala, Python, R

    • A secure cluster that can be used exclusively by a specified single user.

All SQL warehouses are created in shared access mode. To enable Unity Catalog for Databricks SQL, select the Preview channel when configuring a SQL warehouse. See Databricks SQL release notes.

For more information about cluster access modes, see Create clusters and SQL warehouses that can access Unity Catalog.

System tables

information_schema is fully supported for Unity Catalog data assets. Each metastore includes a catalog referred to as system that includes a metastore scoped information_schema. See Information schema. You can use information_schema to answer questions like the following:

“Count the number of tables per catalog”

SELECT table_catalog, count(table_name)
FROM system.information_schema.tables
GROUP BY 1
ORDER by 2 DESC

“Show me all of the tables that have been altered in the last 24 hours“

SELECT table_name, table_owner, created_by, last_altered, last_altered_by, table_catalog
FROM system.information_schema.tables
WHERE  datediff(now(), last_altered) < 1

Structured Streaming support

Structured Streaming workloads are now supported with Unity Catalog. For details and limitations, see the limitations section of these release notes. See Using Unity Catalog with Structured Streaming.

SQL functions

User-defined SQL functions are now fully supported on Unity Catalog. For information about how to create and use SQL UDFs, see CREATE FUNCTION (SQL).

SQL syntax for external locations in Unity Catalog

Standard data definition and data definition language commands are now supported in Spark SQL for external locations, including the following:

CREATE | DROP | ALTER | DESCRIBE | SHOW EXTERNAL LOCATION

You can also manage and view permissions with GRANT, REVOKE, and SHOW for external locations with SQL. See External locations (Databricks SQL).

Example Syntax:

CREATE EXTERNAL LOCATION <your_location_name>
  URL `<your_location_path>'
  WITH (CREDENTIAL <your_credential_name>);

GRANT READ_FILE
  ON EXTERNAL LOCATION <your_location_name>
  TO <group>;

Unity Catalog GA limitations

  • Scala, R, and workloads using the Machine Learning Runtime are supported only on clusters using the single user access mode. Workloads in these languages do not support the use of dynamic views for row-level or column-level security.

  • Shallow clones are not supported when using Unity Catalog as the source or target of the clone.

  • Bucketing is not supported for Unity Catalog tables. If you run commands that try to create a bucketed table in Unity Catalog, it will throw an exception.

  • Writing to the same path or Delta Lake table from workspaces in multiple regions can lead to unreliable performance if some clusters access Unity Catalog and others do not.

  • Overwrite mode for DataFrame write operations into Unity Catalog is supported only for Delta tables, not for other file formats. The user must have the CREATE privilege on the parent schema and must be the owner of the existing object.

  • Streaming currently has the following limitations:

    • It is not supported in clusters using shared access mode. For streaming workloads, you must use single user access mode.

    • Asynchronous checkpointing is not yet supported.

    • Streaming queries lasting more than 30 days on all purpose or jobs clusters will throw an exception. For long running streaming queries, configure automatic job retries.

  • Referencing Unity Catalog tables from Delta Live Tables pipelines is currently not supported.

  • Groups previously created in a workspace cannot be used in Unity Catalog GRANT statements. This is to ensure a consistent view of groups that can span across workspaces. To use groups in GRANT statements, create your groups in the account console and update any automation for principal or group management (such as SCIM, Okta and AAD connectors, and Terraform) to reference account endpoints instead of workspace endpoints.

  • Unity Catalog requires the E2 version of the Databricks platform. All new Databricks accounts and most existing accounts are on E2. If you are unsure which account type you have, contact your Databricks representative.

Unity Catalog availability regions

At the time that Unity Catalog was declared GA, Unity Catalog was available in the following regions.

  • us-east-1

  • us-east-2

  • us-west-2

  • ap-northeast-1

  • ap-northeast-2

  • ap-south-1

  • ap-southeast-1

  • ap-southeast-2

  • ca-central-1

  • eu-central-1

  • eu-west-1

  • eu-west-2

To use Unity Catalog in another region, contact your account representative.