Single node compute

A single node compute is a computing resource consisting of an Apache Spark driver and no Spark workers. Single node compute support Spark jobs and all Spark data sources, including Delta Lake.

Single node compute are helpful for:

  • Single-node machine learning workloads that use Spark to load and save data

  • Lightweight exploratory data analysis

Create a single node compute

The simplest way to create a single node compute is to create a Personal Compute, a compute policy available to all users by default.

To create a single node compute from the create compute UI, select the Single node button.

Single node compute properties

Single node compute have the following properties:

  • Runs Spark locally.

  • The driver acts as both master and worker, with no worker nodes.

  • Spawns one executor thread per logical core in the compute, minus 1 core for the driver.

  • All stderr, stdout, and log4j log output is saved in the driver log.

  • A single node compute can’t be converted to a multi node compute.

Limitations

  • Large-scale data processing will exhaust the resources on a single node compute. For these workloads, Databricks recommends using a multi node compute.

  • Single node compute are not designed to be shared. To avoid resource conflicts, Databricks recommends using a multi node compute when the compute must be shared.

  • A multi node compute can’t be scaled to 0 workers. Use a single node compute instead.

  • Single node compute are not compatible with process isolation.

  • GPU scheduling is not enabled on single node compute.

  • On single node compute, Spark cannot read Parquet files with a UDT column. The following error message results:

    The Spark driver has stopped unexpectedly and is restarting. Your notebook will be automatically reattached.
    

    To work around this problem, disable the native Parquet reader:

    spark.conf.set("spark.databricks.io.parquet.nativeReader.enabled", False)