Databricks Terraform provider

HashiCorp Terraform is a popular open source tool for creating safe and predictable cloud infrastructure across several cloud providers. You can use the Databricks Terraform provider to manage your Databricks workspaces and the associated cloud infrastructure using a flexible, powerful tool. The goal of the Databricks Terraform provider is to support all Databricks REST APIs, supporting automation of the most complicated aspects of deploying and managing your data platforms. Databricks customers are using the Databricks Terraform provider to deploy and manage clusters and jobs, provision Databricks workspaces, and configure data access.

Terraform resource relationship

Experimental

The Databricks Terraform provider is not formally supported by Databricks or AWS. It is maintained by Databricks field engineering teams and provided as is. There is no service level agreement (SLA). Databricks and AWS make no guarantees of any kind. If you discover an issue with the provider, file a GitHub Issue, and it will be reviewed by project maintainers as time permits.

Getting started

Complete the following steps to install and configure the command line tools that Terraform needs to operate. These tools include the Databricks CLI, the Terraform CLI, and the AWS CLI. After setting up these tools, complete the steps to create a base Terraform configuration that you can use later to manage your Databricks workspaces and the associated AWS cloud infrastructure.

Note

This procedure assumes that you have access to a Databricks workspace as a Databricks admin, access to the corresponding AWS account, and the appropriate permissions you want Terraform to perform in that AWS account. For more information, see the following:

  1. Create a Databricks personal access token to allow Terraform to call the Databricks APIs within the Databricks account. For details, see Authentication using Databricks personal access tokens.

  2. Install the Databricks command-line interface (CLI), and then configure the Databricks CLI with your Databricks personal access token by running the databricks configure --token --profile <profile name> command to create a connection profile for this Databricks personal access token. Replace <profile name> with a unique name for this connection profile. For details, see the “Set up authentication” and “Connection profiles” sections in Databricks CLI.

    databricks configure --token --profile <profile name>
    

    Tip

    Each Databricks personal access token is associated with a specific user in a Databricks account. Run the databricks configure --token --profile <profile name> command (replacing <profile name> with a unique name) for each Databricks personal access token that you want to make available for Terraform to use.

  3. Install the Terraform CLI. For details, see Download Terraform on the Terraform website.

  4. Create an AWS access key, which consists of an AWS secret key and an AWS secret access key. For details, see Managing access keys (console) on the AWS website.

  5. Install the AWS CLI, and then configure the AWS CLI with the AWS access key by running the aws configure --profile <profile name> command. Replace <profile name> with a unique name for this connection profile. For details, see Installing, updating, and uninstalling the AWS CLI version 2 and Quick configuration with aws configure on the AWS website.

    aws configure --profile <profile name>
    

    Tip

    Each AWS access key is associated with a specific IAM user in an AWS account. Run the aws configure --profile <profile name> command (replacing <profile name> with a unique name) for each AWS access key that you want to make available for Terraform to use.

    This procedure uses the AWS CLI, along with a shared credentials/configuration file in the default location, to authenticate. For alternative authentication options, see Authentication on the Terraform Registry website.

  6. In your terminal, create an empty directory and then switch to it. (Each separate set of Terraform configuration files must be in its own directory.) For example: mkdir terraform_demo && cd terraform_demo.

    mkdir terraform_demo && cd terraform_demo
    
  7. In this empty directory, create a file named main.tf. Add the following content to this file, and then save the file:

    variable "aws_connection_profile" {
      description = "The name of the AWS connection profile to use."
      type = string
      default = "<AWS connection profile name>"
    }
    
    variable "aws_region" {
      description = "The code of the AWS Region to use."
      type = string
      default = "<AWS Region code>"
    }
    
    variable "databricks_connection_profile" {
      description = "The name of the Databricks connection profile to use."
      type = string
      default = "<Databricks connection profile name>"
    }
    
    terraform {
      required_providers {
        aws = {
          source  = "hashicorp/aws"
          version = "~> 3.27"
        }
    
        databricks = {
          source = "databrickslabs/databricks"
          version = "0.3.2"
        }
      }
    }
    
    provider "aws" {
      profile = var.aws_connection_profile
      region = var.aws_region
    }
    
    provider "databricks" {
      profile = var.databricks_connection_profile
    }
    
  8. Replace the following values in the main.tf file, and then save the file:

    • Replace <AWS connection profile name> with the name of the AWS connection profile that you created earlier in step 5.
    • Replace <AWS Region code> with the code of the AWS Region that you want Terraform to use (for example, us-west-2).
    • Replace <Databricks connection profile name> with the name of the Databricks connection profile that you created earlier in step 2.
  9. Initialize the working directory containing the main.tf file by running the terraform init command. For more information, see Command: init on the Terraform website.

    terraform init
    

    Terraform downloads the aws and databricks providers and installs them in a hidden subdirectory of your current working directory, named .terraform. The terraform init command prints out which version of the providers were installed. Terraform also creates a lock file named .terraform.lock.hcl which specifies the exact provider versions used, so that you can control when you want to update the providers used for your project.

  10. Apply the changes required to reach the desired state of the configuration by running the terraform apply command. For more information, see Command: apply on the Terraform website.

    terraform apply
    

    Because no resources have yet been specified in the main.tf file, the output is Apply complete! Resources: 0 added, 0 changed, 0 destroyed. Also, Terraform writes data into a file called terraform.tfstate. To create resources, continue with Sample configuration, Next steps, or both to specify the desired resources to create, and then run the terraform apply command again. Terraform stores the IDs and properties of the resources it manages in this terraform.tfstate file, so that it can update or destroy those resources going forward.

Sample configuration

Complete the following procedure to create a sample Terraform configuration that creates a notebook and a job to run that notebook, in an existing Databricks workspace.

Note

The following sample Terraform configuration interacts only with an existing Databricks workspace. Because of this, to run this sample you do not need to configure the AWS CLI, nor does your main.tf file need to include the variables aws_connection_profile, aws_region, or the aws provider.

  1. At the end of the main.tf file that you created in Getting started, add the following code:

    variable "resource_prefix" {
      description = "The prefix to use when naming the notebook and job"
      type = string
      default = "terraform-demo"
    }
    
    variable "email_notifier" {
      description = "The email address to send job status to"
      type = list(string)
      default = ["<Your email address>"]
    }
    
    // Get information about the Databricks user that is calling
    // the Databricks API (the one associated with "databricks_connection_profile").
    data "databricks_current_user" "me" {}
    
    // Create a simple, sample notebook. Store it in a subfolder within
    // the Databricks current user's folder. The notebook contains the
    // following basic Spark code in Python.
    resource "databricks_notebook" "this" {
      path     = "${data.databricks_current_user.me.home}/Terraform/${var.resource_prefix}-notebook.ipynb"
      language = "PYTHON"
      content_base64 = base64encode(<<-EOT
        # created from ${abspath(path.module)}
        display(spark.range(10))
        EOT
      )
    }
    
    // Create a job to run the sample notebook. The job will create
    // a cluster to run on. The cluster will use the smallest available
    // node type and run the latest version of Spark.
    
    // Get the smallest available node type to use for the cluster. Choose
    // only from among available node types with local storage.
    data "databricks_node_type" "smallest" {
      local_disk = true
    }
    
    // Get the latest Spark version to use for the cluster.
    data "databricks_spark_version" "latest" {}
    
    // Create the job, emailing notifiers about job success or failure.
    resource "databricks_job" "this" {
      name = "${var.resource_prefix}-job-${data.databricks_current_user.me.alphanumeric}"
      new_cluster {
        num_workers   = 1
        spark_version = data.databricks_spark_version.latest.id
        node_type_id  = data.databricks_node_type.smallest.id
      }
      notebook_task {
        notebook_path = databricks_notebook.this.path
      }
      email_notifications {
        on_success = var.email_notifier
        on_failure = var.email_notifier
      }
    }
    
    // Print the URL to the notebook.
    output "notebook_url" {
      value = databricks_notebook.this.url
    }
    
    // Print the URL to the job.
    output "job_url" {
      value = databricks_job.this.url
    }
    
  2. Replace <Your email address> with your email address, and save the file.

  3. Run terraform apply.

  4. Verify that the notebook and job were created: in the output of the terraform apply command, find the URLs for notebook_url and job_url and go to them.

  5. Run the job: on the Jobs page, click Run Now. After the job finishes, check your email inbox.

  6. When you are done with this sample, delete the notebook and job from the Databricks workspace by running terraform destroy.

  7. Verify that the notebook and job were deleted: refresh the notebook and Jobs pages to display a message that the reources cannot be found.

Troubleshooting

For Terraform-specific support, see the Latest Terraform topics on the HashiCorp Discuss website. For issues specific to the Databricks Terraform Provider, see Issues in the databrickslabs/terraform-provider-databricks GitHub repository.

Additional resources