Single Node clusters

Preview

This feature is in Public Preview.

A Single Node cluster is a cluster consisting of a Spark driver and no Spark workers. Such clusters support Spark jobs and all Spark data sources, including Delta Lake. In contrast, Standard clusters require at least one Spark worker to run Spark jobs.

Single Node clusters are helpful in the following situations:

  • Running single node machine learning workloads that need Spark to load and save data
  • Lightweight exploratory data analysis (EDA)

Create a Single Node cluster

To create a Single Node cluster, select Single Node in the Cluster Mode drop-down list when configuring a cluster.

Single Node cluster mode

Single Node cluster properties

A Single Node cluster has the following properties:

  • Runs Spark locally with as many executor threads as logical cores on the cluster (the number of cores on driver - 1).
  • Has 0 workers, with the driver node acting as both master and worker.
  • The executor stderr, stdout, and log4j logs are in the driver log.
  • Cannot be converted to a Standard cluster. Instead, create a new cluster with the mode set to Standard.

Limitations

  • Single Node clusters are not recommended for large scale data processing. If you exceed the resources on a Single Node cluster, we recommend using a Standard mode cluster.

  • We do not recommend sharing Single Node clusters. Since all workloads would run on the same node, users would be more likely to run into resource conflicts. Databricks recommends Standard mode for shared clusters.

  • You cannot convert a Standard cluster to a Single Node cluster by setting the minimum number of workers to 0. Instead, create a new cluster with the mode set to Single Node.

  • Single Node clusters are not compatible with process isolation.

  • Single Node clusters do not support Databricks Container Services.

  • GPU scheduling is not enabled on Single Node clusters.

  • On Single Node clusters, Spark cannot read Parquet files with a UDT column and may return the following error message:

    The Spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.
    

    To work around this problem, set the Spark configuration spark.databricks.io.parquet.nativeReader.enabled to false with

    spark.conf.set("spark.databricks.io.parquet.nativeReader.enabled", False)
    

Single Node cluster policy

Cluster policies simplify cluster configuration for Single Node clusters.

As an illustrative example, when managing clusters for a data science team that does not have cluster creation permissions, an admin may want to authorize the team to create up to 10 Single Node interactive clusters in total. This can be done using instance pools, cluster policies, and Single Node cluster mode:

  1. Create a pool. You can set max capacity to 10, enable autoscaling local storage, and choose the instance types and Databricks Runtime version. Record the pool ID from the URL.

  2. Create a cluster policy. The value in the policy for instance pool ID and node type ID should match the pool properties. You can relax the constraints to match your needs. See Manage cluster policies.

  3. Grant the cluster policy to the team members. You can use Manage users and groups to simplify user management.

    {
      "spark_conf.spark.databricks.cluster.profile": {
        "type": "fixed",
        "value": "singleNode",
        "hidden": true
      },
      "instance_pool_id": {
        "type": "fixed",
        "value": "singleNodePoolId1",
        "hidden": true
      },
      "spark_version": {
        "type": "fixed",
        "value": "7.3.x-cpu-ml-scala2.12",
        "hidden": true
      },
      "autotermination_minutes": {
        "type": "fixed",
        "value": 120,
        "hidden": true
      },
      "enable_elastic_disk": {
        "type": "fixed",
        "value": true,
        "hidden": true
      },
      "node_type_id": {
        "type": "fixed",
        "value": "r5.4xlarge",
        "hidden": true
      },
      "num_workers": {
        "type": "fixed",
        "value": 0,
        "hidden": true
      }
    }
    

Single Node job cluster policy

To set up a cluster policy for jobs, you can define a similar cluster policy. Remember to set the cluster_type “type” set to “fixed” and “value” set to “job” and remove any reference to auto_termination_minutes.

{
  "cluster_type": {
    "type": "fixed",
    "value": "job"
  },
  "spark_conf.spark.databricks.cluster.profile": {
    "type": "forbidden",
    "hidden": true
  },
  "spark_conf.spark.master": {
    "type": "fixed",
    "value": "local[*]"
  },
  "instance_pool_id": {
    "type": "fixed",
    "value": "singleNodePoolId1",
    "hidden": true
  },
  "num_workers": {
    "type": "fixed",
    "value": 0,
    "hidden": true
  },
  "spark_version": {
    "type": "fixed",
    "value": "7.3.x-cpu-ml-scala2.12",
    "hidden": true
  },
  "enable_elastic_disk": {
    "type": "fixed",
    "value": true,
    "hidden": true
  },
  "node_type_id": {
    "type": "fixed",
    "value": "i3.xlarge",
    "hidden": true
  },
  "driver_node_type_id": {
    "type": "fixed",
    "value": "i3.xlarge",
    "hidden": true
  }
}