Automate Unity Catalog setup using Terraform
You can automate Unity Catalog setup by using the Databricks Terraform provider. This article provides links to the Terraform provider Unity Catalog deployment guide and resource reference documentation, along with requirements (“Before you begin”) and validation and deployment tips.
Before you begin
To automate Unity Catalog setup using Terraform, you must have the following:
Your Databricks account must be on the Premium plan or above.
In AWS, you must have the ability to create Amazon S3 buckets, AWS IAM roles, AWS IAM policies, and cross-account trust relationships.
You must have at least one Databricks workspace that you want to use with Unity Catalog. See Manually create a workspace (existing Databricks accounts).
To use the Databricks Terraform provider to configure a metastore for Unity Catalog, storage for the metastore, any external storage, and all of their related access credentials, you must have the following:
An AWS account.
A Databricks on AWS account.
A service principal that has the account admin role in your Databricks account.
The Terraform CLI. See Download Terraform on the Terraform website.
The following seven Databricks environment variables:
DATABRICKS_CLIENT_ID
, set to the value of the client ID, also known as the application ID, of the service principal. See Authenticate access to Databricks with a service principal using OAuth (OAuth M2M).DATABRICKS_CLIENT_SECRET
, set to the value of the client secret of the service principal. See Authenticate access to Databricks with a service principal using OAuth (OAuth M2M).DATABRICKS_ACCOUNT_ID
, set to the value of the ID of your Databricks account. You can find this value in the corner of your Databricks account console.TF_VAR_databricks_account_id
, also set to the value of the ID of your Databricks account.AWS_ACCESS_KEY_ID
, set to the value of your AWS user’s access key ID. See Programmatic access in the AWS General Reference.AWS_SECRET_ACCESS_KEY
, set to the value of your AWS user’s secret access key. See Programmatic access in the AWS General Reference.AWS_REGION
, set to the value of the AWS Region code for your Databricks account. See Regional endpoints in the AWS General Reference.
To set these environment variables, see your operating system’s documentation.
Note
Basic authentication using a Databricks username and password reached end of life on July 10, 2024. See End of life for Databricks-managed passwords.
To use the Databricks Terraform provider to configure all other Unity Catalog infrastructure components, you must have the following:
A Databricks workspace.
On your local development machine, you must have:
The Terraform CLI. See Download Terraform on the Terraform website.
One of the following:
Databricks CLI version 0.205 or above, configured with your Databricks personal access token by running
databricks configure --host <workspace-url> --profile <some-unique-profile-name>
. See Install or update the Databricks CLI and Databricks personal access token authentication.Note
As a security best practice when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use OAuth tokens.
If you use personal access token authentication, Databricks recommends using personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
The following Databricks environment variables:
DATABRICKS_HOST
, set to the value of your Databricks workspace instance URL, for examplehttps://dbc-1234567890123456.cloud.databricks.com
DATABRICKS_CLIENT_ID
, set to the value of the client ID, also known as the application ID, of the service principal. See Authenticate access to Databricks with a service principal using OAuth (OAuth M2M).DATABRICKS_CLIENT_SECRET
, set to the value of the client secret of the service principal. See Authenticate access to Databricks with a service principal using OAuth (OAuth M2M).
Alternatively, you can use a personal access token instead of a service principal’s client ID and client secret:
DATABRICKS_TOKEN
, set to the value of your Databricks personal access token. See also Monitor and revoke personal access tokens.
To set these environment variables, see your operating system’s documentation.
Note
As a security best practice when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use OAuth tokens.
If you use personal access token authentication, Databricks recommends using personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
Terraform provider Unity Catalog deployment guide and resource reference documentation
To learn how to deploy all prerequisites and enable Unity Catalog for a workspace, see Deploying pre-requisite resources and enabling Unity Catalog in the Databricks Terraform provider documentation.
If you already have some Unity Catalog infrastructure components in place, you can use Terraform to deploy additional Unity Catalog infrastructure components as needed. See each section of the guide referenced in the previous paragraph and the Unity Catalog section of the Databricks Terraform provider documentation.
Validate, plan, deploy, or destroy the resources
To validate the syntax of the Terraform configurations without deploying them, run the
terraform validate
command.To show the actions that Terraform would take to deploy the configurations, run the
terraform plan
command. This command does not actually deploy the configurations.To deploy the configurations, run the
terraform deploy
command.To delete the deployed resources, run the
terraform destroy
command.