Best practices for configuring classic Databricks jobs

Learn general recommendations about features and configurations that can benefit classic Databricks jobs.

Classic jobs require that you create and tailor specific configurations of compute resources, policies, and performance options that fit the needs of your data transformation scenarios. Specific recommendations for configuring the size and types of compute resources vary based on the workload. Review these best practices before you start configuring your classic workflows in order to avoid unwanted additional costs or poor perfomance.

In some cases, serverless compute may be a simpler solution for your scenarios. Serverless compute for jobs manages all infrastructure, eliminating the following considerations. See Run your Databricks job with serverless compute for workflows.

note

Structured Streaming workflows have specific configuration recommendations. See Production considerations for Structured Streaming.

Best practices

Enable Photon Acceleration for common use cases

Databricks recommends enabling Photon Acceleration, using recent Databricks Runtime versions, and using compute configured for Unity Catalog.

Use standard access mode (formerly shared access mode)

Databricks recommends using standard access mode for jobs. See Access modes.

Use cluster policies

Databricks recommends that workspace admins define cluster policies for jobs and enforce these policies for all users who configure jobs.

Cluster policies allow workspace admins to set cost controls and limit users’ configuration options. For details on configuring cluster policies, see Create and manage compute policies.

Databricks provides a default policy configured for jobs. Admins can make this policy available to other workspace users. See Job Compute.

Use autoscaling

Configure autoscaling so that long-running tasks can dynamically add and remove worker nodes during job runs. See Enable autoscaling.

Use a pool to reduce cluster start times

Compute pools allow you to reserve compute resources from your cloud provider. Pools are beneficial to decrease new job cluster start time and ensure compute resource availability. See Pool configuration reference.

Use spot instances

Configure spot instances for workloads that have lax latency requirements to optimize costs. See Spot instances.

Configure availability zones

Specify an availability zone (AZ) if your organization has purchased reserved instances, or use Auto-AZ to retry in other availability zones if AWS returns insufficient capacity errors. See Availability zones.

Should all-purpose compute ever be used for jobs?

There are numerous reasons that Databricks recommends against using all-purpose compute for jobs, including the following:

Databricks bills for all-purpose compute at a different rate than jobs compute.
Jobs compute terminates automatically after a job run is complete. All-purpose compute supports auto-termination, which is tied to inactivity rather than the end of a job run.
All-purpose compute is often shared across teams of users. Jobs scheduled against all-purpose compute often have increased latency due to competition for compute resources.
Many recommendations for optimizing jobs compute configuration are not appropriate for the type of ad-hoc queries and interactive workloads run on all-purpose compute.

The following are use cases in which you might choose to use all-purpose compute for jobs:

You are iteratively developing or testing new jobs. Start-up times for jobs compute can make iterative development tedious. All-purpose compute allows you to apply changes and run your job quickly.
You have short-lived jobs that must be run frequently or on a specific schedule. There is no start-up time associated with the currently running all-purpose compute. Consider costs associated with idle time if using this pattern.

Serverless compute for jobs is the recommended substitute for most task types you might consider running against all-purpose compute.

Best practices​

Enable Photon Acceleration for common use cases​

Use standard access mode (formerly shared access mode)​

Use cluster policies​

Use autoscaling​

Use a pool to reduce cluster start times​

Use spot instances​

Configure availability zones​

Should all-purpose compute ever be used for jobs?​