Skip to main content

Troubleshoot UNITY_CATALOG_INITIALIZATION_FAILED

This page describes how to diagnose and resolve the UNITY_CATALOG_INITIALIZATION_FAILED error in Databricks pipelines.

Overview

UNITY_CATALOG_INITIALIZATION_FAILED is a catch-all error that appears when Unity Catalog cannot initialize storage for a pipeline or workload during cluster startup. Despite its name, the failure is usually not caused by a Unity Catalog misconfiguration. Instead, it almost always indicates an underlying infrastructure problem: a networking issue preventing the cluster from reaching the Databricks control plane, or a permissions issue preventing access to the cloud storage backing the Unity Catalog metastore.

This error surfaces across multiple Databricks products, including Lakeflow Connect pipelines, Lakeflow Spark Declarative Pipelines pipelines, and Vector Search.

General error message

Encountered an error with Unity Catalog while setting up the pipeline on cluster [CLUSTER_ID].
Ensure that your Unity Catalog configuration is correct, and that required resources
(e.g., catalog, schema) exist and are accessible. Also verify that the cluster has
appropriate permissions to access Unity Catalog.

The error message instructs you to check your Unity Catalog configuration, but the root cause is often a networking or cloud permissions issue as described in the following sections.

Root causes and resolutions

Cause: During initialization, Unity Catalog connects to a regional Databricks hostname (for example, nvirginia.cloud.databricks.com) directly rather than through the workspace URL. In customer-managed VPCs with PrivateLink, the workspace URL is correctly routed through the PAS CNAME chain to the private VPC endpoint. However, the regional hostname bypasses this chain entirely.

If the VPC endpoint's Private DNS names option is not enabled, the regional hostname resolves to a public IP address. In VPCs without a NAT gateway (such as those with all outbound traffic routed through a firewall), this causes the connection to fail with a reset error, which surfaces as UNITY_CATALOG_INITIALIZATION_FAILED.

Resolution:

  1. In the AWS Management Console, navigate to VPC > Endpoints and open the VPC endpoint for your Databricks workspace.
  2. Verify that Private DNS names enabled is set to Yes. When enabled, AWS automatically creates a Route 53 Private Hosted Zone for cloud.databricks.com associated with your VPC, so that regional hostnames like nvirginia.cloud.databricks.com resolve to private IPs.
  3. If Private DNS names is not enabled, verify that your VPC meets the prerequisites:
    • enableDnsSupport is set to true
    • enableDnsHostnames is set to true
  4. Enable Private DNS names on the endpoint and verify that the regional hostname now resolves to a private IP within your VPC.

For more information, see Configure classic private connectivity to Databricks and Configure DNS for AWS inbound Private Link.

Missing S3 permissions on Unity Catalog storage

Cause: The IAM role associated with serverless compute clusters lacks the necessary permissions to access the S3 bucket backing your Unity Catalog metastore. Unity Catalog initialization attempts to access the internal __unitystorage path within this bucket. A 403 response from S3 during this access causes UNITY_CATALOG_INITIALIZATION_FAILED.

Resolution:

  1. Identify the S3 bucket backing your Unity Catalog metastore. This is visible in the error logs as a path of the form s3://[BUCKET]/__unitystorage/....
  2. In the AWS Management Console, verify that the IAM role used by your serverless clusters has the following permissions on that bucket:
    • s3:GetObject
    • s3:PutObject
    • s3:DeleteObject
    • s3:ListBucket
    • s3:GetBucketLocation
  3. If you use a network connectivity configuration (NCC) for private connectivity to S3, verify that the NCC's private endpoint rule covers the UC metastore bucket and that the endpoint's connection state is Established.
  4. Restart the pipeline after making any changes.

For more information about configuring private S3 connectivity for serverless workloads, see Configure private connectivity to AWS S3 storage buckets.

Unity Catalog resources not configured correctly

Cause: The catalog, schema, or connection referenced by the pipeline does not exist or is not accessible from the workspace. This is the case described by the error message itself and is less common than the infrastructure issues above.

Resolution:

  1. Verify that the catalog and schema referenced in the pipeline exist and are accessible from the workspace. In Databricks, go to Catalog and confirm that the catalog is visible and that you have at least USE CATALOG and USE SCHEMA privileges.
  2. If you are using Lakeflow Connect, verify that the connection used by the pipeline is valid. Go to Catalog > External Data > Connections and confirm that the connection is listed and accessible.
  3. Verify that the cluster or pipeline has the necessary Unity Catalog privileges. See Manage privileges in Unity Catalog.