Jobs
A job is a way of running a notebook or JAR either immediately or on a scheduled basis. The other way to run a notebook is interactively in the notebook UI.
You can create and run jobs using the UI, the CLI, and by invoking the Jobs API. You can monitor job run results in the UI, using the CLI, by querying the API, and through email alerts. This article focuses on performing job tasks using the UI. For the other methods, see Jobs CLI and Jobs API.
Important
- A workspace is limited to 1000 concurrent job runs. A
429 Too Many Requests
response is returned when you request a run that cannot be started immediately. - The number of jobs a workspace can create in an hour is limited to 5000 (includes “run now” and “runs submit”). This limit also affects jobs created by the REST API and notebook workflows.
View jobs
Click the Jobs icon in the sidebar. The Jobs list displays. The Jobs page lists all defined jobs, the cluster definition, the schedule if any, and the result of the last run.
In the Jobs list, you can filter jobs:
- Using key words.
- Selecting only jobs you own or jobs you have access to. Access to this filter depends on Jobs access control being enabled.
You can also click any column header to sort the list of jobs (either descending or ascending) by that column. By default, the page is sorted on job names in ascending order.

Create a job
Click + Create Job. The job detail page displays.
Enter a name in the text field with the placeholder text
Untitled
.Specify the task type: click Select Notebook, Set JAR, or Configure spark-submit.
Notebook
- Select a notebook and click OK.
- Next to Parameters, click Edit. Specify key-value pairs or a JSON string representing key-value pairs. Such parameters set the value of widgets.
JAR: Upload a JAR, specify the main class and arguments, and click OK. To learn more about JAR jobs, see JAR job tips.
spark-submit: Specify the main class, path to the library JAR, arguments, and click Confirm. To learn more about spark-submit, see the Apache Spark documentation.
Note
The following Databricks features are not available for spark-submit jobs:
- Cluster autoscaling. To learn more about autoscaling, see Cluster autoscaling.
- Databricks Utilities. If you want to use Databricks Utilities, use JAR jobs instead.
In the Dependent Libraries field, optionally click Add and specify dependent libraries. Dependent libraries are automatically attached to the cluster on launch. Follow the recommendations in Library dependencies for specifying dependencies.
Important
If you have configured a library to automatically install on all clusters or in the next step you select an existing terminated cluster that has libraries installed, the job execution does not wait for library installation to complete. If a job requires a certain library, you should attach the library to the job in the Dependent Libraries field.
In the Cluster field, click Edit and specify the cluster on which to run the job. In the Cluster Type drop-down, choose New Job Cluster or Existing All-Purpose Cluster.
Note
Keep the following in mind when you choose a cluster type:
- For production-level jobs or jobs that are important to complete, we recommend that you select New Job Cluster.
- You can run spark-submit jobs only on new clusters.
- When you run a job on a new cluster, the job is treated as a data engineering (job) workload subject to the job workload pricing. When you run a job on an existing cluster, the job is treated as a data analytics (all-purpose) workload subject to all-purpose workload pricing.
- If you select a terminated existing cluster and the job owner has Can Restart permission, Databricks starts the cluster when the job is scheduled to run.
- Existing clusters work best for tasks such as updating dashboards at regular intervals.
New Job Cluster - complete the cluster configuration.
- In the cluster configuration, select a runtime version. For help with selecting a runtime version, see Databricks Runtime and Databricks Light.
- To decrease new cluster start time, select a pool in the cluster configuration.
If you want to take advantage of automatic availability zones (Auto-AZ), you must use the Clusters API to enable Auto-AZ, setting
awsattributes.zone_id = "auto"
. See also Availability zones.Existing All-Purpose Cluster - in the drop-down, select the existing cluster.
In the Schedule field, optionally click Edit and schedule the job. See Run a job.
Optionally click Advanced and specify advanced job options. See Advanced job options.
View job details
On the Jobs page, click a job name in the Name column. The job details page shows configuration parameters, active runs, and completed runs.

Databricks maintains a history of your job runs for up to 60 days. If you need to preserve job runs, we recommend that you export job run results before they expire. For more information, see Export job run results.
In the job runs page, you can view the standard error, standard output, log4j output for a job run by clicking the Logs link in the Spark column.
Run a job
You can run a job on a schedule or immediately.
Schedule a job
To define a schedule for the job:
Click Edit next to Schedule.
The Schedule Job dialog displays.
Specify the schedule granularity, starting time, and time zone. Optionally select the Show Cron Syntax checkbox to display and edit the schedule in Quartz Cron Syntax.
Note
- Databricks enforces a minimum interval of 10 seconds between subsequent runs triggered by the schedule of a job regardless of the seconds configuration in the cron expression.
- You can choose a time zone that observes daylight saving time or a UTC time. If you select a zone that observes daylight saving time, an hourly job will be skipped or may appear to not fire for an hour or two when daylight saving time begins or ends. If you want jobs to run at every hour (absolute time), choose a UTC time.
- The job scheduler, like the Spark batch interface, is not intended for low latency jobs. Due to network or cloud issues, job runs may occasionally be delayed up to several minutes. In these situations, scheduled jobs will run immediately upon service availability.
Click Confirm.
Run a job with different parameters
You can use Run Now with Different Parameters to re-run a job specifying different parameters or different values for existing parameters.
In the Active runs table, click Run Now with Different Parameters. The dialog varies depending on whether you are running a notebook job or a spark-submit job.
Notebook - A UI that lets you set key-value pairs or a JSON object displays. You can use this dialog to set the values of widgets:
spark-submit - A dialog containing the list of parameters displays. For example, you could run the SparkPi estimator described in Create a job with 100 instead of the default 10 partitions:
Specify the parameters. The provided parameters are merged with the default parameters for the triggered run. If you delete keys, the default parameters are used.
Click Run.
Notebook job tips
Total notebook cell output (the combined output of all notebook cells) is subject to a 20MB size limit. Additionally, individual cell output is subject to an 8MB size limit. If total cell output exceeds 20MB in size, or if the output of an individual cell is larger than 8MB, the run will be canceled and marked as failed. If you need help finding cells that are near or beyond the limit, run the notebook against an all-purpose cluster and use this notebook autosave technique.
JAR job tips
There are some caveats you need to be aware of when you run a JAR job.
Output size limits
Note
Available in Databricks Runtime 6.3 and above.
Job output, such as log output emitted to stdout, is subject to a 20MB size limit. If the total output has a larger size, the run will be canceled and marked as failed.
To avoid encountering this limit, you can prevent stdout from being returned from the driver to
Databricks by setting the spark.databricks.driver.disableScalaOutput
Spark configuration to
true
. By default the flag value is false
. The flag controls cell output for Scala JAR jobs and
Scala notebooks. If the flag is enabled, Spark does not return job execution results to the client.
The flag does not affect the data that is written in the cluster’s log files. Setting this flag is
recommended only for job clusters for JAR jobs, because it will disable notebook results.
Use try-finally
blocks for job clean up
Consider a JAR that consists of two parts:
jobBody()
which contains the main part of the jobjobCleanup()
which has to be executed afterjobBody()
, irrespective of whether that function succeeded or returned an exception
As an example, jobBody()
may create tables, and you can use jobCleanup()
to drop these tables.
The safe way to ensure that the clean up method is called is to put a try-finally
block in the code:
try {
jobBody()
} finally {
jobCleanup()
}
You should should not try to clean up using sys.addShutdownHook(jobCleanup)
or
val cleanupThread = new Thread { override def run = jobCleanup() }
Runtime.getRuntime.addShutdownHook(cleanupThread)
Due to the way the lifetime of Spark containers is managed in Databricks, the shutdown hooks are not run reliably.
Configure JAR job parameters
JAR jobs are parameterized with an array of strings.
- In the UI, you input the parameters in the Arguments text box which are split into an array by applying POSIX shell parsing rules. For more information, reference the shlex documentation.
- In the API, you input the parameters as a standard JSON array. For more information, reference SparkJarTask. To access these parameters, inspect the
String
array passed into yourmain
function.
View job run details
A job run details page contains job output and links to logs:

You can view job run details from the Jobs page and the Clusters page.
Click the Jobs icon
. In the Run column of the Completed in past 60 days table, click the run number link.
Click the Clusters icon
. In a job row in the Job Clusters table, click the Job Run link.
Export job run results
You can export notebook run results and job run logs for all job types.
Export notebook run results
You can persist job runs by exporting their results. For notebook job runs, you can export a rendered notebook which can be later be imported into your Databricks workspace.
In the job detail page, click a job run name in the Run column.
Click Export to HTML.
Export job run logs
You can also export the logs for your job run. To automate this process, you can set up your job so that it automatically delivers logs to DBFS or S3 through the Job API. For more information, see the NewCluster and ClusterLogConf fields in the Job Create API call.
Library dependencies
The Spark driver has certain library dependencies that cannot be overridden. These libraries take priority over any of your own libraries that conflict with them.
To get the full list of the driver library dependencies, run the following command inside a notebook attached to a cluster of the same Spark version (or the cluster with the driver you want to examine).
%sh
ls /databricks/jars
Manage library dependencies
A good rule of thumb when dealing with library dependencies while creating JARs for jobs is to list Spark and Hadoop as provided
dependencies. On Maven, add Spark and/or Hadoop as provided dependencies as shown in the following example.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.0</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
<scope>provided</scope>
</dependency>
In sbt
, add Spark and Hadoop as provided dependencies as shown in the following example.
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0" % "provided"
libraryDependencies += "org.apache.hadoop" %% "hadoop-core" % "1.2.1" % "provided"
Tip
Specify the correct Scala version for your dependencies based on the version you are running.
Advanced job options
Maximum concurrent runs
The maximum number of runs that can be run in parallel. On starting a new run, Databricks skips the run if the job has already reached its maximum number of active runs. Set this value higher than the default of 1 if you want to be able to perform multiple runs of the same job concurrently. This is useful for example if you trigger your job on a frequent schedule and want to allow consecutive runs to overlap with each other, or if you want to trigger multiple runs that differ by their input parameters.
Alerts
Email alerts sent in case of job failure, success, or timeout. You can set alerts up for job start, job success, and job failure (including skipped jobs), providing multiple comma-separated email addresses for each alert type. You can also opt out of alerts for skipped job runs.

Integrate these email alerts with your favorite notification tools, including:
Timeout
The maximum completion time for a job. If the job does not complete in this time, Databricks sets its status to “Timed Out”.
Retries
Policy that determines when and how many times failed runs are retried.

Note
If you configure both Timeout and Retries, the timeout applies to each retry.
Control access to jobs
Job access control enable job owners and administrators to grant fine grained permissions on their jobs. With job access controls, job owners can choose which other users or groups can view results of the job. Owners can also choose who can manage runs of their job (that is, invoke Run Now and Cancel.)
See Jobs access control for details.