Create and manage catalogs

This article shows how to create and manage catalogs in Unity Catalog. A catalog contains schemas (databases), and a schema contains tables and views.

Note

To learn how to create a foreign catalog, a Unity Catalog object that mirrors a database in an external data system, see Create a foreign catalog. See also Manage and work with foreign catalogs.

Requirements

  • You must be a Databricks metastore admin or have been granted the CREATE CATALOG privilege on the metastore

  • Your Databricks account must be on the Premium plan or above.

  • You must have a Unity Catalog metastore linked to the workspace where you perform the catalog creation.

  • The compute resource that you use to run the notebook or Databricks SQL to create the catalog must be using a Unity Catalog compliant access mode.

Create a catalog

To create a catalog, you can use Catalog Explorer or a SQL command.

  1. Log in to a workspace that is linked to the metastore.

  2. Click Catalog icon Catalog.

  3. Click the Create Catalog button.

  4. (Optional) Specify the location where data for managed tables in the catalog will be stored.

    Specify a location here only if you do not want managed tables in this catalog to be stored in the default root storage location that was configured for the metastore. See Create a Unity Catalog metastore.

    The path that you specify must be defined in an external location configuration, and you must have the CREATE MANAGED STORAGE privilege on that external location. You can also use a subpath of that path. See Manage external locations and storage credentials.

  5. Click Create.

  6. (Optional) Specify the workspace that the catalog is bound to.

    By default, the catalog is shared with all workspaces attached to the current metastore. If the catalog will contain data that should be restricted to specific workspaces, go to the Workspaces tab and add those workspaces.

    For more information, see (Optional) Assign a catalog to specific workspaces.

  7. Assign permissions for your catalog. See Unity Catalog privileges and securable objects.

  1. Run the following SQL command in a notebook or Databricks SQL editor. Items in brackets are optional. Replace the placeholder values:

    • <catalog-name>: A name for the catalog.

    • <location-path>: Optional. Provide a storage location path if you want managed tables in this catalog to be stored in a location that is different than the default that was configured for the metastore. This path must be defined in an external location configuration, and you must have the CREATE MANAGED STORAGE privilege on the external location configuration. You can use the path that is defined in the external location configuration or a subpath (in other words, 's3://depts/finance' or 's3://depts/finance/product'). Requires Databricks Runtime 11.3 and above.

    • <comment>: Optional description or other comment.

    CREATE CATALOG [ IF NOT EXISTS ] <catalog-name>
       [ MANAGED LOCATION '<location-path>' ]
       [ COMMENT <comment> ];
    

    For example, to create a catalog named example:

    CREATE CATALOG IF NOT EXISTS example;
    
  2. Assign privileges to the catalog. See Unity Catalog privileges and securable objects.

  1. Run the following SQL command in a notebook. Items in brackets are optional. Replace the placeholder values:

    • <catalog-name>: A name for the catalog.

    • <location-path>: Optional. Provide a storage location path if you want managed tables in this catalog to be stored in a location that is different than the default that was configured for the metastore. This path must be defined in an external location configuration, and you must have the CREATE MANAGED STORAGE privilege on the external location configuration. You can use the path that is defined in the external location configuration or a subpath (in other words, 's3://depts/finance' or 's3://depts/finance/product'). Requires Databricks Runtime 11.3 and above.

    • <comment>: An optional comment.

    spark.sql("CREATE CATALOG [ IF NOT EXISTS ] <catalog-name> [ MANAGED LOCATION '<location-path>' ] [ COMMENT <comment> ]")
    

    For example, to create a catalog named example:

    spark.sql("CREATE CATALOG IF NOT EXISTS example")
    

    For parameter descriptions, see CREATE CATALOG.

  2. Assign privileges to the catalog. See Unity Catalog privileges and securable objects.

  1. Run the following SQL command in a notebook. Items in brackets are optional. Replace the placeholder values:

    • <catalog-name>: A name for the catalog.

    • <location-path>: Optional. Provide a storage location path if you want managed tables in this catalog to be stored in a location that is different than the default that was configured for the metastore. This path must be defined in an external location configuration, and you must have the CREATE MANAGED STORAGE privilege on the external location configuration. You can use the path that is defined in the external location configuration or a subpath (in other words, 's3://depts/finance' or 's3://depts/finance/product'). Requires Databricks Runtime 11.3 and above.

    • <comment>: An optional comment.

    library(SparkR)
    
    sql("CREATE CATALOG [ IF NOT EXISTS ] <catalog-name> [ MANAGED LOCATION '<location-path>' ] [ COMMENT <comment> ]")
    

    For example, to create a catalog named example:

    library(SparkR)
    
    sql("CREATE CATALOG IF NOT EXISTS example")
    
  2. Assign privileges to the catalog. See Unity Catalog privileges and securable objects.

  1. Run the following SQL command in a notebook. Items in brackets are optional. Replace the placeholder values:

    • <catalog-name>: A name for the catalog.

    • <location-path>: Optional. Provide a storage location path if you want managed tables in this catalog to be stored in a location that is different than the default that was configured for the metastore. This path must be defined in an external location configuration, and you must have the CREATE MANAGED STORAGE privilege on the external location configuration. You can use the path that is defined in the external location configuration or a subpath (in other words, 's3://depts/finance' or 's3://depts/finance/product'). Requires Databricks Runtime 11.3 and above.

    • <comment>: An optional comment.

    spark.sql("CREATE CATALOG [ IF NOT EXISTS ] <catalog-name> [ MANAGED LOCATION '<location-path>' ] [ COMMENT <comment> ]")
    

    For example, to create a catalog named example:

    spark.sql("CREATE CATALOG IF NOT EXISTS example")
    
  2. Assign privileges to the catalog. See Unity Catalog privileges and securable objects.

When you create a catalog, two schemas (databases) are automatically created: default and information_schema.

You can also create a catalog by using the Databricks Terraform provider and databricks_catalog. You can retrieve information about catalogs by using databricks_catalogs.

(Optional) Assign a catalog to specific workspaces

If you use workspaces to isolate user data access, you may want to limit catalog access to specific workspaces in your account, also known as workspace-catalog binding. The default is to share the catalog with all workspaces attached to the current metastore.

Typical use cases for binding a catalog to specific workspaces include:

  • Ensuring that users can only access production data from a production workspace environment.

  • Ensuring that users can only process sensitive data from a dedicated workspace.

Take the example of production and development isolation. If you specify that your production data catalogs can only be accessed from production workspaces, this supersedes any individual grants that are issued to users.

Catalog-workspace binding diagram

In this diagram, prod_catalog is bound to two production workspaces. Suppose a user has been granted access to a table in prod_catalog called my_table (using GRANT SELECT ON my_table TO <user>). If the user tries to access my_table in the Dev workspace, they receive an error message. The user can access my_table only from the Prod ETL and Prod Analytics workspaces.

Workspace-catalog bindings are respected in all areas of the platform. For example, if you query the information schema, you see only the catalogs accessible in the workspace where you issue the query. Data lineage and search UIs likewise show only the catalogs that are assigned to the workspace (whether using bindings or by default).

Configuration

To assign a catalog to specific workspaces, you can use Catalog Explorer or the Unity Catalog REST API:

Permissions required: Metastore admin or catalog owner.

Note

Metastore admins can see all catalogs in a metastore using Catalog Explorer—and catalog owners can see all catalogs they own in a metastore—regardless of whether the catalog is assigned to the current workspace. Catalogs that are not assigned to the workspace appear grayed out, and no child objects are visible or queryable.

  1. Log in to a workspace that is linked to the metastore.

  2. Click Catalog icon Catalog.

  3. In the Catalog pane, on the left, click the catalog name.

    The main Catalog Explorer pane defaults to the Catalogs list. You can also select the catalog there.

  4. On the Workspaces tab, clear the All workspaces have access checkbox.

  5. Click Assign to workspaces and enter or find the workspace you want to assign.

To revoke access, go to the Workspaces tab, select the workspace, and click Revoke.

There are two APIs and two steps required to assign a catalog to a workspace. In the following examples, replace <workspace-url> with your workspace instance name. To learn how to get the workspace instance name and workspace ID, see Get identifiers for workspace objects. To learn about getting access tokens, see Authentication for Databricks automation.

  1. Use the catalogs API to set the catalog’s isolation mode to ISOLATED:

    curl -L -X PATCH 'https://<workspace-url>/api/2.1/unity-catalog/catalogs/<my-catalog> \
    -H 'Authorization: Bearer <my-token> \
    -H 'Content-Type: application/json' \
    --data-raw '{
     "isolation_mode": "ISOLATED"
     }'
    

    The default isolation mode is OPEN to all workspaces attached to the metastore.

  2. Use the workspace-bindings API to assign the workspaces to the catalog:

    curl -L -X PATCH 'https://<workspace-url>/api/2.1/unity-catalog/workspace-bindings/catalogs/<my-catalog> \
    -H 'Authorization: Bearer <my-token> \
    -H 'Content-Type: application/json' \
    --data-raw '{
      "assign_workspaces": [<workspace-id>, <workspace-id2>],
      "unassign_workspaces": [<workspace-id>, <workspace-id2>]
    }'
    

To list all workspace assignments for a catalog, use the workspace-bindings API:

   curl -L -X PATCH 'https://<workspace-url>/api/2.1/unity-catalog/workspace-bindings/catalogs/<my-catalog> \
   -H 'Authorization: Bearer <my-token> \

View catalog details

To view information about a catalog, you can use Catalog Explorer or a SQL command.

  1. Log in to a workspace that is linked to the metastore.

  2. Click Catalog icon Catalog.

  3. In the Catalog pane, find the catalog and click its name.

    Some details are listed at the top of the page. Others can be viewed on the Schemas, Details, Permissions, and Workspaces tabs.

Run the following SQL command in a notebook or Databricks SQL editor. Items in brackets are optional. Replace the placeholder <catalog-name>.

For details, see DESCRIBE CATALOG.

DESCRIBE CATALOG <catalog-name>;

Use CATALOG EXTENDED to get full details.

Delete a catalog

To delete (or drop) a catalog, you can use Catalog Explorer or a SQL command. To drop a catalog you must be its owner.

You must delete all schemas in the catalog except information_schema before you can delete a catalog. This includes the auto-created default schema.

  1. Log in to a workspace that is linked to the metastore.

  2. Click Catalog icon Catalog.

  3. In the Catalog pane, on the left, click the catalog you want to delete.

  4. In the detail pane, click the three-dot menu to the left of the Create database button and select Delete.

  5. On the Delete catalog dialog, click Delete.

Run the following SQL command in a notebook or Databricks SQL editor. Items in brackets are optional. Replace the placeholder <catalog-name>.

For parameter descriptions, see DROP CATALOG.

If you use DROP CATALOG without the CASCADE option, you must delete all schemas in the catalog except information_schema before you can delete the catalog. This includes the auto-created default schema.

DROP CATALOG [ IF EXISTS ] <catalog-name> [ RESTRICT | CASCADE ]

For example, to delete a catalog named vaccine and its schemas:

DROP CATALOG vaccine CASCADE

Run the following SQL command in a notebook. Items in brackets are optional. Replace the placeholder <catalog-name>.

For parameter descriptions, see DROP CATALOG.

If you use DROP CATALOG without the CASCADE option, you must delete all schemas in the catalog except information_schema before you can delete the catalog. This includes the auto-created default schema.

spark.sql("DROP CATALOG [ IF EXISTS ] <catalog-name> [ RESTRICT | CASCADE ]")

For example, to delete a catalog named vaccine and its schemas:

spark.sql("DROP CATALOG vaccine CASCADE")

Run the following SQL command in a notebook. Items in brackets are optional. Replace the placeholder <catalog-name>.

For parameter descriptions, see DROP CATALOG.

If you use DROP CATALOG without the CASCADE option, you must delete all schemas in the catalog except information_schema before you can delete the catalog. This includes the auto-created default schema.

library(SparkR)

rsql("DROP CATALOG [ IF EXISTS ] <catalog-name> [ RESTRICT | CASCADE ]")

For example, to delete a catalog named vaccine and its schemas:

library(SparkR)

sql("DROP CATALOG vaccine CASCADE")

Run the following SQL command in a notebook. Items in brackets are optional. Replace the placeholder <catalog-name>.

For parameter descriptions, see DROP CATALOG.

If you use DROP CATALOG without the CASCADE option, you must delete all schemas in the catalog except information_schema before you can delete the catalog. This includes the auto-created default schema.

spark.sql("DROP CATALOG [ IF EXISTS ] <catalog-name> [ RESTRICT | CASCADE ]")

For example, to delete a catalog named vaccine and its schemas:

spark.sql("DROP CATALOG vaccine CASCADE")