Skip to main content

Migrate from classic compute to serverless compute

Migrate your workloads from classic compute to serverless compute. Serverless compute handles provisioning, scaling, runtime upgrades, and optimization automatically.

Most classic workloads can migrate with minimal or no code changes. This page focuses on those workloads. Some features, such as df.cache, are not yet supported on serverless, but will not require code changes once available. Certain workloads that depend on R or Scala notebooks require classic compute and will not be able to migrate to serverless. For a full list of current limitations, see Serverless compute limitations.

Migration steps

To migrate your workloads from classic compute to serverless compute, follow these steps:

  1. Check prerequisites: Verify that your workspace, networking, and cloud storage access meet the requirements. See Before you begin.
  2. Update code: Make any necessary code and configuration changes. See Update your code.
  3. Test your workloads: Validate compatibility and correctness before cutting over. See Test your workloads.
  4. Choose a performance mode: Select the performance mode that best matches your workload requirements. See Choose a performance mode.
  5. Migrate in phases: Roll out serverless incrementally, starting with new and low-risk workloads. See Migrate in phases.
  6. Monitor costs: Track serverless DBU consumption and set up alerts. See Monitor costs.

Before you begin

Before you begin migrating, you might need to update some legacy configurations in your workspace.

Prerequisite

Action

Details

Workspace is enabled for Unity Catalog

Migrate from Hive Metastore if needed

Upgrade a Databricks workspace to Unity Catalog

Networking configured

Replace VPC peering with NCCs, Private Link, or firewall rules

Serverless compute plane networking

Cloud storage access

Replace legacy data access patterns with Unity Catalog external locations

Connect to cloud object storage using Unity Catalog

Update your code

The following sections list the code and configuration changes required to make your workloads compatible with serverless.

Data access

Legacy data access patterns are not supported on serverless. Update your code to use Unity Catalog instead.

Classic pattern

Serverless replacement

Details

DBFS paths (dbfs:/...)

Unity Catalog volumes

What are Unity Catalog volumes?

Hive Metastore tables

Unity Catalog tables (or HMS Federation)

Upgrade a Databricks workspace to Unity Catalog

Storage account credentials

Unity Catalog external locations

Connect to cloud object storage using Unity Catalog

Custom JDBC JARs

Lakehouse Federation

What is query federation?

warning

DBFS access is limited on serverless. Update all dbfs:/ paths to Unity Catalog volumes before migrating. For more information, see Migrate files stored in DBFS.

Example: Replace DBFS paths and Hive Metastore references

Python
# Classic
df = spark.read.csv("dbfs:/mnt/datalake/data.csv", header=True)
df.write.parquet("dbfs:/mnt/output/results")
df = spark.table("my_database.my_table")

# Serverless
df = spark.read.csv("/Volumes/main/sales/raw_data/data.csv", header=True)
df.write.parquet("/Volumes/main/analytics/output/results")
df = spark.table("main.my_database.my_table") # three-level namespace

APIs and code

Certain APIs and code patterns are not supported on serverless. Reference this table to see if your code needs to be updated.

Classic pattern

Serverless replacement

Details

RDD APIs (sc.parallelize, rdd.map)

DataFrame APIs

Compare Spark Connect to Spark Classic

df.cache(), df.persist()

Remove caching calls

Serverless compute limitations

spark.sparkContext, sqlContext

Use spark (SparkSession) directly

Compare Spark Connect to Spark Classic

Hive variables (${var})

SQL DECLARE VARIABLE or Python f-strings

DECLARE VARIABLE

Unsupported Spark configs

Remove unsupported configs. Serverless auto-tunes most settings.

Configure Spark properties for serverless notebooks and jobs

Example: Replace RDD operations with DataFrames

Python
from pyspark.sql import functions as F

# sc.parallelize + rdd.map
# Classic: rdd = sc.parallelize([1, 2, 3]); rdd.map(lambda x: x * 2).collect()
df = spark.createDataFrame([(1,), (2,), (3,)], ["value"])
result = df.select((F.col("value") * 2).alias("value")).collect()

# rdd.flatMap
# Classic: sc.parallelize(["hello world"]).flatMap(lambda l: l.split(" ")).collect()
df = spark.createDataFrame([("hello world",)], ["line"])
words = df.select(F.explode(F.split("line", " ")).alias("word")).collect()

# rdd.groupByKey
# Classic: rdd.groupByKey().mapValues(list).collect()
df = spark.createDataFrame([("a", 1), ("b", 2), ("a", 3)], ["key", "value"])
grouped = df.groupBy("key").agg(F.collect_list("value").alias("values")).collect()

# rdd.mapPartitions → applyInPandas
import pandas as pd
def process_group(pdf: pd.DataFrame) -> pd.DataFrame:
return pd.DataFrame({"total": [pdf["id"].sum()]})
result = (spark.range(100).repartition(4)
.groupBy(F.spark_partition_id())
.applyInPandas(process_group, schema="total long").collect())

# sc.textFile → spark.read.text
df = spark.read.text("/Volumes/catalog/schema/volume/file.txt")

Example: Replace SparkContext and caching

Python
from pyspark.sql.functions import broadcast

# sc.broadcast → broadcast join
result = main_df.join(broadcast(lookup_df), "key")

# sc.accumulator → DataFrame aggregation
total = df.agg(F.sum("amount")).collect()[0][0]

# sqlContext.sql → spark.sql
result = spark.sql("SELECT * FROM main.db.table")

# df.cache() → remove caching calls
# Materialize expensive intermediate results to Delta as a workaround:
df = spark.read.parquet(path)
result = df.filter("status = 'active'")
expensive_df.write.format("delta").mode("overwrite").saveAsTable("main.scratch.temp")
result = spark.table("main.scratch.temp")

Libraries and environments

You can manage libraries and environments at the workspace level using base environments and at the notebook level using the notebook's serverless environment.

Classic pattern

Serverless replacement

Details

Init scripts

Serverless environments

Configure the serverless environment

Cluster-scoped libraries

Notebook-scoped or environment libraries

Configure the serverless environment

Maven/JAR libraries

JAR task support for jobs; PyPI for notebooks

JAR task for jobs

Docker containers

Serverless environments for library needs

Configure the serverless environment

Pin Python packages in requirements.txt for reproducible environments. See Best practices for serverless compute.

Streaming

Streaming workloads are supported on serverless, but certain triggers are not supported. Update your code to use the supported triggers.

Spark trigger

Supported

Notes

Trigger.AvailableNow()

Yes

Recommended

Trigger.Once()

Yes

This is deprecated. Use Trigger.AvailableNow() instead.

Trigger.ProcessingTime(interval)

No

Returns INFINITE_STREAMING_TRIGGER_NOT_SUPPORTED

Trigger.Continuous(interval)

No

Use Lakeflow Spark Declarative Pipelines continuous mode instead

Default (not setting .trigger())

No

Omitting .trigger() defaults to ProcessingTime("0 seconds"), which is not supported on serverless. Always set .trigger(availableNow=True) explicitly.

For continuous streaming, migrate to Spark Declarative Pipelines in continuous mode or use continuous-schedule jobs with AvailableNow. For large sources, set maxFilesPerTrigger or maxBytesPerTrigger to prevent out-of-memory errors.

Example: Fix streaming triggers

Python
# Classic (not supported on serverless — default trigger is ProcessingTime)
query = df.writeStream.format("delta").outputMode("append").start()

# Serverless (explicit AvailableNow trigger)
query = (df.writeStream.format("delta").outputMode("append")
.trigger(availableNow=True)
.option("checkpointLocation", checkpoint_path)
.start(output_path))
query.awaitTermination()

# With OOM prevention for large sources
query = (spark.readStream.format("delta")
.option("maxFilesPerTrigger", 100)
.option("maxBytesPerTrigger", "10g")
.load(input_path)
.writeStream.format("delta")
.trigger(availableNow=True)
.option("checkpointLocation", checkpoint_path)
.start(output_path))

Test your workloads

  1. Quick compatibility test: Run the workload on classic compute with Standard access mode and Databricks Runtime 14.3 or above. If the run succeeds, the workload can migrate to serverless without any code changes.
  2. A/B comparison (recommended for production): Run the same workload on classic (control) and serverless (experiment). Diff output tables and verify correctness. Iterate until outputs match.
  3. Temporary configs: You can temporarily set supported Spark configs during testing. Remove them once stable.

Choose a performance mode

Serverless jobs and pipelines support two performance modes: standard and performance-optimized. The performance mode you choose depends on your workload requirements.

Mode

Availability

Startup

Best for

Standard

Jobs, Lakeflow Spark Declarative Pipelines

4-6 minutes

Cost-sensitive batch

Performance-optimized

Notebooks, Jobs, Lakeflow Spark Declarative Pipelines

Seconds

Interactive, latency-sensitive

Migrate in phases

  1. New workloads: Start all new notebooks and jobs on serverless.
  2. Low-risk workloads: Migrate PySpark/SQL workloads already on standard access mode and Databricks Runtime 14.3 or above.
  3. Complex workloads: Migrate workloads needing code changes (RDD rewrites, DBFS updates, trigger fixes).
  4. Remaining workloads: Review periodically as capabilities expand.

Monitor costs

Serverless billing is based on DBU consumption, not cluster uptime. Validate cost expectations with representative workloads before migrating at scale.

Additional resources

You can also refer to the following blog posts for more information: