Updating from Jobs API 2.1 to 2.2
This article details updates and enhancements to the functionality in version 2.2 of the Jobs API and includes information to help you update your existing API clients to work with this new version. To learn about the changes between versions 2.0 and 2.1 of the API, see Updating from Jobs API 2.0 to 2.1.
Because the Jobs API 2.2 version enhances the existing support for paginating large result sets, Databricks recommends using Jobs API 2.2 for your API scripts and clients, particularly when responses might include a large number of tasks.
In addition to the changes included in version 2.1 of the Databricks Jobs API, version 2.2 has the following enhancements:
Jobs are queued by default
Job queueing is an optional feature that prevents job runs from being skipped when resources are unavailable for the run. Job queueing is supported in the 2.0, 2.1, and 2.2 versions of the Jobs API, with the following differences in the default handling of queueing:
For jobs created with the Jobs API 2.2, queueing is enabled by default. You can turn off queueing by setting the
queue
field tofalse
in request bodies when you create or update a job.For jobs created with the 2.0 and 2.1 versions of the Jobs API, queueing is not enabled by default. With these versions, you must enable queueing by setting the
queue
field totrue
in request bodies when you create or update a job.
You can enable or disable queueing when you create a job, partially update a job, or update all job settings.
See Job queueing.
Support for paging long task and task run lists
To support jobs with a large number of tasks or task runs, Jobs API 2.2 changes how large result sets are returned for the following requests:
List jobs: See Changes to the List jobs and List job runs requests.
List job runs: See Changes to the List jobs and List job runs requests.
The Jobs API 2.2 changes pagination for these requests as follows:
Fields representing lists of elements such as tasks, parameters, job_clusters, or environments are limited to 100 elements per response. If more than 100 values are available, the response body includes a
next_page_token
field containing a token to retrieve the next page of results.Pagination is added for the responses to the
Get a single job
andGet a single job run
requests. Pagination for the responses to theList job
andList job runs
request was added with Jobs API 2.1.
The following is an example response body from a Get a single job
request for a job with more than 100 tasks. To demonstrate the token-based paging functionality, this example omits most fields included in the response body:
{
"job_id": 11223344,
"settings": {
"tasks": [
{
"task_key": "task-1"
},
{
"task_key": "task-2"
},
{
"task_key": "task-..."
},
{
"task_key": "task-100"
}
]
},
"next_page_token": "Z29...E="
}
To retrieve the next set of results, set the page_token
query parameter in the next request to the value returned in the next_page_token
field. For example, /api/2.2/jobs/get?job_id=11223344&page_token=Z29...E=
.
If no more results are available, the next_page_token
field is not included in the response.
The following sections provide more detail on the updates to each of the list
and get
requests.
Changes to the List jobs
and List job runs
requests
For the List jobs and List job runs requests, the has_more
parameter at the root level of the response object is removed. Instead, use the existence of the next_page_token
to determine if more results are available. Otherwise, the functionality to paginate results remains unchanged.
To prevent large response bodies, the top-level tasks
and job_clusters
arrays for each job are omitted from responses by default. To include these arrays for each job included in the response body for these requests, add the expand_tasks=true
parameter to the request. When expand_tasks
is enabled, a maximum of 100 elements are returned in the tasks
and job_clusters
arrays. If either of these arrays has more than 100 elements, a has_more
field (not to be confused with the root-level has_more
field that is removed) inside the job
object is set to true.
However, only the first 100 elements are accessible. You cannot retrieve additional tasks or clusters after the first 100 with the List jobs request. To fetch more elements, use the requests that return a single job or a single job run: Get a single job and Get a single run.
Get a single job
In Jobs API 2.2, the Get a single job request to retrieve details about a single job now supports pagination of the tasks
and job_clusters
fields when the size of either field exceeds 100 elements. Use the next_page_token
field at the object root to determine if more results are available. The value of this field is then used as the value for the page_token
query parameter in subsequent requests. Array fields with fewer than 100 elements in one page will be empty on subsequent pages.
Get a single run
In Jobs API 2.2, the Get a single run request to retrieve details about a single run now supports pagination of the tasks
and job_clusters
fields when the size of either field exceeds 100 elements. Use the next_page_token
field at the object root to determine if more results are available. The value of this field is then used as the value for the page_token query parameter in subsequent requests. Array fields with fewer than 100 elements in one page will be empty on subsequent pages.
Jobs API 2.2 also adds the only_latest
query parameter to this endpoint to enable showing only the latest run attempts in the tasks
array. When the only_latest
parameter is true
, any runs superseded by a retry or a repair are omitted from the response.
When the run_id
refers to a ForEach
task run, a field named iterations
is present in the response. The iterations
field is an array containing details for all runs of the ForEach
task’s nested task and has the following properties:
The schema of each object in the
iterations
array is the same as that of the objects in thetasks
array.If the
only_latest
query parameter is set totrue
, only the latest run attempts are included in theiterations
array.Pagination is applied to the
iterations
array instead of thetasks
array.The
tasks
array is still included in the response and includes theForEach
task run.
To learn more about the ForEach
task, see the ForEach task documentation.
For example, see the following response for a ForEach
task with some fields omitted:
{
"job_id": 53,
"run_id": 759600,
"number_in_job": 7,
"original_attempt_run_id": 759600,
"state": {
"life_cycle_state": "TERMINATED",
"result_state": "SUCCESS",
"state_message": ""
},
"cluster_spec": {},
"start_time": 1595943854860,
"setup_duration": 0,
"execution_duration": 0,
"cleanup_duration": 0,
"trigger": "ONE_TIME",
"creator_user_name": "user@databricks.com",
"run_name": "process_all_numbers",
"run_type": "JOB_RUN",
"tasks": [
{
"run_id": 759600,
"task_key": "process_all_numbers",
"description": "Process all numbers",
"for_each_task": {
"inputs": "[ 1, 2, ..., 101 ]",
"concurrency": 10,
"task": {
"task_key": "process_number_iteration"
"notebook_task": {
"notebook_path": "/Users/user@databricks.com/process_single_number",
"base_parameters": {
"number": "{{input}}"
}
}
},
"stats": {
"task_run_stats": {
"total_iterations": 101,
"scheduled_iterations": 101,
"active_iterations": 0,
"failed_iterations": 0,
"succeeded_iterations": 101,
"completed_iterations": 101
}
}
}
"state": {
"life_cycle_state": "TERMINATED",
"result_state": "SUCCESS",
"state_message": ""
}
}
],
"iterations": [
{
"run_id": 759601,
"task_key": "process_number_iteration",
"notebook_task": {
"notebook_path": "/Users/user@databricks.com/process_single_number",
"base_parameters": {
"number": "{{input}}"
}
},
"state": {
"life_cycle_state": "TERMINATED",
"result_state": "SUCCESS",
"state_message": ""
}
},
{
"run_id": 759602,
"task_key": "process_number_iteration",
"notebook_task": {
"notebook_path": "/Users/user@databricks.com/process_single_number",
"base_parameters": {
"number": "{{input}}"
}
},
"state": {
"life_cycle_state": "TERMINATED",
"result_state": "SUCCESS",
"state_message": ""
}
}
],
"format": "MULTI_TASK",
"next_page_token": "eyJ..x9"
}