Run jobs continuously
Use continuous mode to schedule workloads to run continuously. Databricks recommends using continuous mode for always-on streaming workloads.
Continuous mode replaces legacy recommendations for Structured Streaming workloads to configure jobs with an unlimited retry policy and a maximum of one concurrent run.
Serverless compute for jobs does not support continuous mode.
Configure job to run in continuous mode
To configure a job to run in continuous mode, do the following:
-
In your Databricks workspace's sidebar, click Jobs & Pipelines.
-
Optionally, select the Jobs and Owned by me filters.
-
Click your job's Name link.
-
Click Add trigger in the Job details panel, select Continuous in Trigger type
-
Optionally, select a Task retry mode. You can choose to retry On failure, to retry failed tasks within a job, or select Never to only retry at the job level. The Task retry mode defaults to On failure for continuous mode.
noteFor an already existing job, you may need to first click Configure retry mode and then select a task retry mode.
-
Click Save.
To stop a continuous job, click the Pause button. Click Resume to restart the job in continuous mode.
- There can be only one running instance of a continuous job.
- A delay exists between a run finishing and a new run starting. This delay should be less than 60 seconds.
- You cannot use task dependencies with a continuous job.
- You cannot use retry policies in a continuous job. Instead, continuous jobs automatically retry the entire job on failure using an exponential backoff algorithm.
- You can additionally configure retries at the task level by setting the Task retry mode to On failure.
- Select Run now to trigger a new job run on a paused continuous job.
- To have your continuous job pick up a new configuration, cancel the existing run. A new run automatically starts. You can also click Restart run to restart the job run with the updated configuration.
How are failures handled for continuous jobs?
Failures are managed using an exponential backoff algorithm.
When Task retry mode is set to On failure, failed tasks are retried with an exponentially increasing delay until the maximum number of allowed retries is reached (three for a single task job). After the maximum retries are reached, the run is canceled, and a new run is triggered. For jobs with multiple tasks, a failed task triggers a new run if there are no other tasks running, or all other uncompleted tasks are also in a failed or retry state.
Consecutive failures at a job level are also managed using exponential backoff, which allows continuous jobs to run without pausing and return to a healthy state when recoverable failures occur.
When a continuous job exceeds the allowable threshold for consecutive failures, the following describes how subsequent job runs are managed:
- The job is restarted after a retry period set by the system.
- If the next job run fails, the retry period is increased, and the job is restarted after this new retry period.
- For each subsequent job run failure, the retry period is increased up to a maximum retry period set by the system. After reaching the maximum retry period, the job continues to be retried using the maximum retry period. There is no limit on the number of retries for a continuous job.
- If the job run completes successfully and starts a new run, or if the run exceeds a threshold without failure, the job is considered healthy, and the backoff sequence resets.
You can restart a continuous job in the exponential backoff state in the Jobs UI or by passing the job ID to the POST /api/2.1/jobs/run-now request in the Jobs 2.1 API or the POST /api/2.0/jobs/run-now request in the Jobs 2.0 API.