Configure a database instance for high availability and enable readable secondary instances
This feature is in Public Preview in the following regions: us-east-1
, us-west-2
, eu-west-1
, ap-southeast-1
, ap-southeast-2
, eu-central-1
, us-east-2
, ap-south-1
.
This page describes how to configure a Lakebase database instance for high availability and outlines the associated benefits.
To enable high availability, specify additional nodes as part of a database instance. If the primary compute becomes unhealthy or unavailable, a high availability node is utilized to perform failover, and the secondary node is promoted to primary.
You can also enable readable secondaries, where the high availability nodes can serve read-only workloads using a separate DNS endpoint (instance-ro-{uuid}
compared with instance-{uuid}
).
Enable database instance for high availability
If you set the number of high availability nodes to one, high availability and readable secondaries are disabled. Otherwise, you have one primary node, and the rest are high-availability nodes. The maximum number of high-availability nodes is three per database instance.
Specify the number of high-availability nodes when creating your database instance. See Create a database instance.
Perform the following steps to modify the database instance using the UI or API.
- UI
- curl
- Click Compute in the workspace sidebar.
- Click the Database instances tab.
- Select the database instance you want to update.
- Click Edit in the upper-right corner.
- Enter the value for HA pool node size (including primary).
- Turn on Enable readable secondaries.
- Click Save.
curl -s -X PATCH --header "Authorization: Bearer ${DATABRICKS_TOKEN}" $DBR_URL/database/instances/my-instance -d '{ “node_count”: 3, “enable_readable_secondaries” : true}'
Compute resiliency
With high availability nodes configured, your database instance's primary node is protected by high availability. If the primary node becomes unavailable, the database instance automatically fails it to a secondary node, promoting it as the new primary node. As compute nodes are stateless, failures don't affect your data, and your connection string remains unchanged. The unavailability is resolved in seconds to minutes, depending on the type of failure. However, your application should be configured to handle brief disconnections and reconnect automatically.
Secondary nodes in your database instance are also auto-recovered in minutes when issues occur. If you enable readable secondaries, Databricks recommends that you have at least two high-availability nodes, in case the primary failover affects the read-only connection. Your application still needs a reconnection mechanism to handle the brief downtime.
Limitations
Performance takes time to recover. Queries may initially run more slowly until they are rebuilt because the new primary node has no session-specific data and a local cache for frequently accessed data.
- Cross-region replication is not supported. In the event of a region-wide outage, the availability depends on the cloud provider restoring service to the affected region.