Run your Databricks job with serverless compute for workflows

Preview

Serverless compute for workflows is in Public Preview. For information on eligibility and enablement, see Enable serverless compute public preview.

Important

Because the public preview of serverless compute for workflows does not support controlling egress traffic, your jobs have full access to the internet.

Serverless compute for workflows allows you to run your Databricks job without configuring and deploying infrastructure. With serverless compute, you focus on implementing your data processing and analysis pipelines, and Databricks efficiently manages compute resources, including optimizing and scaling compute for your workloads. Autoscaling and Photon are automatically enabled for the compute resources that run your job.

Serverless compute for workflows auto-optimization automatically optimizes compute by selecting appropriate resources such as instance types, memory, and processing engines based on your workload. Auto-optimization also automatically retries failed jobs.

Databricks automatically upgrades the Databricks Runtime version to support enhancements and upgrades to the platform while ensuring the stability of your Databricks jobs. To see the current Databricks Runtime version used by serverless compute for workflows, see Serverless compute release notes.

Because cluster creation permission is not required, all workspace users can use serverless compute to run their workflows.

This article describes using the Databricks Jobs UI to create and run jobs that use serverless compute. You can also automate creating and running jobs that use serverless compute with the Jobs API, Databricks Asset Bundles, and the Databricks SDK for Python.

Requirements

  • Your Databricks workspace must have Unity Catalog enabled.

  • Because serverless compute for workflows uses shared access mode, your workloads must support this access mode.

  • Your Databricks workspace must be in a supported region. See Databricks clouds and regions.

Create a job using serverless compute

Serverless compute is supported with the notebook, Python script, dbt, and Python wheel task types. By default, serverless compute is selected as the compute type when you create a new job and add one of these supported task types.

Create serverless task

Databricks recommends using serverless compute for all job tasks. You can also specify different compute types for tasks in a job, which might be required if a task type is not supported by serverless compute for workflows.

Configure an existing job to use serverless compute

You can switch an existing job to use serverless compute for supported task types when you edit the job. To switch to serverless compute, either:

  • In the Job details side panel click Swap under Compute, click New, enter or update any settings, and click Update.

  • Click Down Caret in the Compute drop-down menu and select Serverless.

Switch task to serverless compute

Schedule a notebook using serverless compute

In addition to using the Jobs UI to create and schedule a job using serverless compute, you can create and run a job that uses serverless compute directly from a Databricks notebook. See Create and manage scheduled notebook jobs.

Set Spark configuration parameters

You can set the following Spark configuration parameters, but only at the session level, by setting them in a notebook that is part of the job. See Get and set Apache Spark configuration properties in a notebook.

  • spark.sql.legacy.timeParserPolicy

  • spark.sql.session.timeZone

Configure notebook environments and dependencies

To manage library dependencies and environment configuration for a notebook task, add the configuration to a cell in the notebook. The following example installs Python libraries using pip install from workspace files and with a requirements.txt file and sets a spark.sql.session.timeZone session variable:

%pip install -r ./requirements.txt
%pip install simplejson
%pip install /Volumes/my/python.whl
%pip install /Workspace/my/python.whl
%pip install https://some-distro.net/popular.whl
spark.conf.set('spark.sql.session.timeZone', 'Europe/Amsterdam')

To set the same environment across multiple notebooks, you can use a single notebook to configure the environment and then use the %run magic command to run that notebook from any notebook that requires the environment configuration. See Use %run to import a notebook.

Configure environments and dependencies for non-notebook tasks

For other supported task types, such as Python script, Python wheel, or dbt tasks, a default environment includes installed Python libraries. To see the list of installed libraries, see the Installed Python libraries section in the release notes for the Databricks Runtime version on which your serverless compute for workflows deployment is based. To see the current Databricks Runtime version used by serverless compute for workflows, see Serverless compute release notes. You can also install Python libraries if a task requires a library that is not installed. you can install Python libraries from workspace files, Unity Catalog volumes, or public package repositories. To add a library when you create or edit a task:

  1. In the Environment and Libraries dropdown menu, click Edit Icon next to the Default environment or click + Add new environment.

    Edit default environment
  2. In the Configure environment dialog, click + Add library.

  3. Select the type of dependency from the dropdown menu under Libraries.

  4. In the File Path text box, enter the path to the library.

  • For a Python Wheel in a workspace file, the path should be absolute and start with /Workspace/.

  • For a Python Wheel in a Unity Catalog volume, the path should be /Volumes/<catalog>/<schema>/<volume>/<path>.whl.

  • For a requirements.txt file, select PyPi and enter -r /path/to/requirements.txt.

    Add task libraries
  1. Click Confirm or + Add library to add another library.

  2. If you’re adding a task, click Create task. If you’re editing a task, click Save task.

Configure serverless compute auto-optimization to disallow retries

Serverless compute for workflows auto-optimization automatically optimizes the compute used to run your jobs and retries failed jobs. Auto-optimization is enabled by default, and Databricks recommends leaving it enabled to ensure critical workloads run successfully at least once. However, if you have workloads that must be executed at most once, for example, jobs that are not idempotent, you can turn off auto-optimization when adding or editing a task:

  1. Next to Retries, click Add (or Edit Icon if a retry policy already exists).

  2. In the Retry Policy dialog, uncheck Enable serverless auto-optimization (may include additional retries).

  3. Click Confirm.

  4. If you’re adding a task, click Create task. If you’re editing a task, click Save task.

Monitor the cost of jobs that use serverless compute for workflows

You can monitor the cost of jobs that use serverless compute for workflows by querying the billable usage system table. This table is updated to include user and workload attributes about serverless costs. See Billable usage system table reference.

View details for your Spark queries

Serverless compute for workflows has a new interface for viewing detailed runtime information for your Spark statements, such as metrics and query plans. See View query insights.

Limitations

For a list of serverless compute for workflows limitations, see Serverless compute limitations in the serverless compute release notes.