Spark Submit Task Deprecation Notice & Migration Guide
The Spark Submit task is deprecated and pending removal. Usage of this task type is disallowed for new use cases and strongly discouraged for existing customers. See Spark Submit (legacy) for the original documentation for this task type. Keep reading for migration instructions.
Why is Spark Submit being deprecated?
The Spark Submit task type is being deprecated due to technical limitations and feature gaps that are not in the JAR, Notebook, or Python script tasks. These tasks offer better integration with Databricks features, improved performance, and greater reliability.
Deprecation measures
Databricks is implementing the following measures in connection with the deprecation:
- Restricted creation: Only users who have used Spark Submit tasks in the preceding month, starting in November 2025, can create new Spark Submit tasks. If you need an exception, contact your account support.
- API deprecation notices: API requests attempting to create or edit a Spark Submit task may be randomly rejected to serve a deprecation notice. Retry the requests with the same parameters until they succeed.
- DBR version restrictions: Spark Submit usage is restricted to existing DBR versions and maintenance releases. Existing DBR versions with Spark Submit will continue to receive security and bugfix maintenance releases until the feature is shut down completely. DBR 17.3+ and 18.x+ will not support this task type.
- UI warnings: Warnings appear throughout the Databricks UI where Spark Submit tasks are in use, and communications are sent to workspace administrators in accounts of existing users.
Migrate JVM workloads to JAR tasks
For JVM workloads, migrate your Spark Submit tasks to JAR tasks. JAR tasks provide better feature support and integration with Databricks.
Follow these steps to migrate:
- Create a new JAR task in your job.
- From your Spark Submit task parameters, identify the first three arguments. They generally follow this pattern:
["--class", "org.apache.spark.mainClassName", "dbfs:/path/to/jar_file.jar"] - Remove the
--classparameter. - Set the main class name (for example,
org.apache.spark.mainClassName) as the Main class for your JAR task. - Provide the path to your JAR file (for example,
dbfs:/path/to/jar_file.jar) in the JAR task configuration. - Copy any remaining arguments from your Spark Submit task to the JAR task parameters.
- Run the JAR task and verify it works as expected.
For detailed information on configuring JAR tasks, see JAR task.
Migrate R workloads
If you're launching an R script directly from a Spark Submit task, multiple migration paths are available.
Option A: Use Notebook tasks
Migrate your R script to a Databricks notebook. Notebook tasks support a full set of features, including cluster autoscaling, and provide better integration with the Databricks platform.
Option B: Bootstrap R scripts from a Notebook task
Use a Notebook task to bootstrap your R scripts. Create a notebook with the following code and reference your R file as a job parameter. Modify to add parameters used by your R script, if needed:
dbutils.widgets.text("script_path", "", "Path to script")
script_path <- dbutils.widgets.get("script_path")
source(script_path)
Find jobs that use Spark Submit tasks
You can use the following Python script to identify all jobs in your workspace that are viewable by you that contain Spark Submit tasks. This helps you inventory affected jobs and plan your migration. A valid personal access or other token will be needed and your workspace URL should be used.
#!/usr/bin/env python3
"""
Requirements:
databricks-sdk>=0.20.0
Usage:
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-token"
python list_spark_submit_jobs.py
Output:
CSV format with columns: Job ID, Owner ID/Email, Job Name
Incorrect:
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com/?o=12345678910"
"""
import csv
import os
import sys
from databricks.sdk import WorkspaceClient
def main():
# Get credentials from environment
workspace_url = os.environ.get("DATABRICKS_HOST")
token = os.environ.get("DATABRICKS_TOKEN")
if not workspace_url or not token:
print("Error: Set DATABRICKS_HOST and DATABRICKS_TOKEN environment variables", file=sys.stderr)
sys.exit(1)
# Initialize client
client = WorkspaceClient(host=workspace_url, token=token)
# Scan workspace for jobs with Spark Submit tasks
print("Scanning workspace for jobs with Spark Submit tasks... (this will take a while)", file=sys.stderr)
jobs_with_spark_submit = []
total_jobs = 0
for job in client.jobs.list(expand_tasks=True):
total_jobs += 1
# Check if job has any Spark Submit tasks
if job.settings and job.settings.tasks:
has_spark_submit = any(
task.spark_submit_task is not None
for task in job.settings.tasks
)
if has_spark_submit:
job_name = job.settings.name or f"Unnamed Job {job.job_id}"
owner_email = job.creator_user_name or "Unknown"
jobs_with_spark_submit.append({
'job_id': job.job_id,
'owner_email': owner_email,
'job_name': job_name
})
# Print summary to stderr
print(f"Scanned {total_jobs} jobs total", file=sys.stderr)
print(f"Found {len(jobs_with_spark_submit)} jobs with Spark Submit tasks", file=sys.stderr)
print("", file=sys.stderr)
# Output CSV to stdout
if jobs_with_spark_submit:
writer = csv.DictWriter(
sys.stdout,
fieldnames=['job_id', 'owner_email', 'job_name'],
quoting=csv.QUOTE_MINIMAL
)
writer.writeheader()
writer.writerows(jobs_with_spark_submit)
else:
print("No jobs with Spark Submit tasks found.", file=sys.stderr)
if __name__ == "__main__":
main()
Need help?
If you need additional help, please contact your account support.