AWS Graviton-enabled clusters

Databricks clusters support AWS Graviton instances. These instances use AWS-designed Graviton processors that are built on top of the Arm64 instruction set architecture. AWS claims that instance types with these processors have the best price-to-performance ratio of any instance type on Amazon EC2.

Availability

Databricks supports AWS Graviton-enabled clusters:

  • On Databricks Runtime 9.1 LTS and above for non-Photon, and Databricks Runtime 10.2 (Unsupported) and above for Photon.

  • In all AWS Regions. Note, however, that not all instance types are available in all Regions. If you select an instance type that is not available in the Region for a workspace, you get a cluster creation failure.

  • For the following AWS Graviton instance families:

    For non-Photon:

    For Photon:

    • General Purpose: m6gd

    • Memory Optimized: r6gd

  • For AWS Graviton2 processors only.

Note

Delta Live Tables is not supported on Graviton-enabled clusters.

Create an AWS Graviton-enabled cluster

Use the instructions in Create a cluster to create your AWS Graviton-enabled cluster.

The process to specify the cluster’s AWS Graviton instance type depends on the method you use to create the cluster. The instructions that follow are specific for each cluster creation process:

Create button or cluster UI

Follow the instructions in Create a cluster. For Databricks runtime version, select one of the runtimes as listed in the preceding Availability section. For Worker type, Driver type, or both, select one of the available AWS Graviton instance types as listed in the preceding Availability section.

Databricks REST API

  1. Set up authentication for the Databricks REST API, if you have not done so already.

  2. Use your tool of choice to call the Databricks REST API, such as curl or Postman.

  3. Call the POST clusters/create operation in the Clusters API. For example, you can use curl to make a call similar to the following:

    curl --netrc -X POST \
    https://dbc-a1b2345c-d6e7.cloud.databricks.com/api/2.0/clusters/create \
    --data @create-cluster.json
    

    create-cluster.json:

    {
      "cluster_name": "my-cluster",
      "spark_version": "10.2.x-scala2.12",
      "node_type_id": "m6gd.large",
      "num_workers": 2
    }
    

    The preceding request payload specifies a non-Photon runtime. To specify a Photon runtime, add runtime_engine: "PHOTON" to the request payload, as follows. (Do not add photon anywhere in the spark_version field.)

    For Photon:

    {
      "cluster_name": "my-cluster",
      "spark_version": "10.2.x-scala2.12",
      "node_type_id": "m6gd.large",
      "num_workers": 2,
      "runtime_engine": "PHOTON"
    }
    

Databricks CLI

  1. Set up the CLI and Set up authentication, if you have not done so already.

  2. Run the clusters create subcommand in the Clusters CLI. For example, you can run the subcommand similar to the following:

    databricks clusters create --json-file create-cluster.json
    

    create-cluster.json:

    {
      "cluster_name": "my-cluster",
      "spark_version": "10.2.x-scala2.12",
      "node_type_id": "m6gd.large",
      "num_workers": 2
    }
    

    The preceding request payload specifies a non-Photon runtime. To specify a Photon runtime, add runtime_engine: "PHOTON" to the request payload, as follows. (Do not add photon anywhere in the spark_version field.)

    For Photon:

    {
      "cluster_name": "my-cluster",
      "spark_version": "10.2.x-scala2.12",
      "node_type_id": "m6gd.large",
      "num_workers": 2,
      "runtime_engine": "PHOTON"
    }
    

Databricks Terraform provider

  1. Install and configure the command line tools that Terraform needs to operate, if you have not done so already.

  2. Create and run a Terraform configuration that creates a Databricks cluster resource. For example, you can run a minimal configuration similar to the following:

    terraform {
      required_providers {
        databricks = {
          source = "databricks/databricks"
        }
      }
    }
    
    provider "databricks" {
    }
    
    resource "databricks_cluster" "this" {
      cluster_name  = "my-cluster"
      spark_version = "10.2.x-scala2.12"
      node_type_id  = "m6gd.large"
      num_workers   = 2
    }
    

    The preceding request payload specifies a non-Photon runtime. To specify a Photon runtime, add runtime_engine: "PHOTON" to the request payload, as follows. (Do not add photon anywhere in the spark_version field.)

    For Photon:

    resource "databricks_cluster" "this" {
      cluster_name   = "my-cluster"
      spark_version  = "10.2.x-scala2.12"
      node_type_id   = "m6gd.large"
      num_workers    = 2,
      runtime_engine = "PHOTON"
    }
    

Limitations

ARM64 ISA

  • Floating point precision changes: typical operations like adding, subtracting, multiplying, and dividing have no change in precision. For single triangle functions such as sin and cos, the upper bound on the precision difference to Intel instances is 1.11e-16.

  • Third party support: the change in ISA may have some impact on support for third-party tools and libraries.

  • Mixed-instance clusters: Databricks does not support mixing AWS Graviton and non-AWS Graviton instance types, as each type requires a different Databricks Runtime.

Unsupported features

AWS Graviton does not support the following features:

  • Databricks Runtime for Machine Learning

  • JDK 11 on ARM64 for Databricks Runtime 10 and above

  • Databricks Container Services

  • Delta Live Tables

  • Databricks SQL

See also