Migrate from classic compute to serverless compute

Migrate your workloads from classic compute to serverless compute. Serverless compute handles provisioning, scaling, runtime upgrades, and optimization automatically.

Most classic workloads can migrate with minimal or no code changes. This page focuses on those workloads. Some features, such as df.cache, are not yet supported on serverless, but will not require code changes once available. Certain workloads that depend on R or Scala notebooks require classic compute and will not be able to migrate to serverless. For a full list of current limitations, see Serverless compute limitations.

Migration steps

To migrate your workloads from classic compute to serverless compute, follow these steps:

Check prerequisites: Verify that your workspace, networking, and cloud storage access meet the requirements. See Before you begin.
Update code: Make any necessary code and configuration changes. See Update your code.
Test your workloads: Validate compatibility and correctness before cutting over. See Test your workloads.
Choose a performance mode: Select the performance mode that best matches your workload requirements. See Choose a performance mode.
Migrate in phases: Roll out serverless incrementally, starting with new and low-risk workloads. See Migrate in phases.
Monitor costs: Track serverless DBU consumption and set up alerts. See Monitor costs.

Before you begin

Before you begin migrating, you might need to update some legacy configurations in your workspace.

Prerequisite	Action	Details
Workspace is enabled for Unity Catalog	Migrate from Hive Metastore if needed	Upgrade a Databricks workspace to Unity Catalog
Networking configured	Replace VPC peering with NCCs, Private Link, or firewall rules	Serverless compute plane networking
Cloud storage access	Replace legacy data access patterns with Unity Catalog external locations	Connect to cloud object storage using Unity Catalog

Replace DBFS mounts that use instance profiles with Unity Catalog external locations.

Update your code

The following sections list the code and configuration changes required to make your workloads compatible with serverless.

Data access

Legacy data access patterns are not supported on serverless. Update your code to use Unity Catalog instead.

Classic pattern	Serverless replacement	Details
DBFS paths (`dbfs:/...`)	Unity Catalog volumes	What are Unity Catalog volumes?
Hive Metastore tables	Unity Catalog tables (or HMS Federation)	Upgrade a Databricks workspace to Unity Catalog
IAM instance profiles	Unity Catalog external locations	Connect to cloud object storage using Unity Catalog
Custom JDBC JARs	Lakehouse Federation	What is query federation?

warning

DBFS access is limited on serverless. Update all dbfs:/ paths to Unity Catalog volumes before migrating. For more information, see Migrate files stored in DBFS.

Example: Replace DBFS paths and Hive Metastore references

Python
# Classic
df = spark.read.csv("dbfs:/mnt/datalake/data.csv", header=True)
df.write.parquet("dbfs:/mnt/output/results")
df = spark.table("my_database.my_table")

# Serverless
df = spark.read.csv("/Volumes/main/sales/raw_data/data.csv", header=True)
df.write.parquet("/Volumes/main/analytics/output/results")
df = spark.table("main.my_database.my_table")  # three-level namespace

APIs and code

Certain APIs and code patterns are not supported on serverless. Reference this table to see if your code needs to be updated.

Classic pattern	Serverless replacement	Details
RDD APIs (`sc.parallelize`, `rdd.map`)	DataFrame APIs	Compare Spark Connect to Spark Classic
`df.cache()`, `df.persist()`	Remove caching calls	Serverless compute limitations
`spark.sparkContext`, `sqlContext`	Use `spark` (SparkSession) directly	Compare Spark Connect to Spark Classic
Hive variables (`${var}`)	SQL `DECLARE VARIABLE` or Python f-strings	DECLARE VARIABLE
Unsupported Spark configs	Remove unsupported configs. Serverless auto-tunes most settings.	Configure Spark properties for serverless notebooks and jobs

Example: Replace RDD operations with DataFrames

Python
from pyspark.sql import functions as F

# sc.parallelize + rdd.map
# Classic:  rdd = sc.parallelize([1, 2, 3]); rdd.map(lambda x: x * 2).collect()
df = spark.createDataFrame([(1,), (2,), (3,)], ["value"])
result = df.select((F.col("value") * 2).alias("value")).collect()

# rdd.flatMap
# Classic:  sc.parallelize(["hello world"]).flatMap(lambda l: l.split(" ")).collect()
df = spark.createDataFrame([("hello world",)], ["line"])
words = df.select(F.explode(F.split("line", " ")).alias("word")).collect()

# rdd.groupByKey
# Classic:  rdd.groupByKey().mapValues(list).collect()
df = spark.createDataFrame([("a", 1), ("b", 2), ("a", 3)], ["key", "value"])
grouped = df.groupBy("key").agg(F.collect_list("value").alias("values")).collect()

# rdd.mapPartitions → applyInPandas
import pandas as pd
def process_group(pdf: pd.DataFrame) -> pd.DataFrame:
    return pd.DataFrame({"total": [pdf["id"].sum()]})
result = (spark.range(100).repartition(4)
    .groupBy(F.spark_partition_id())
    .applyInPandas(process_group, schema="total long").collect())

# sc.textFile → spark.read.text
df = spark.read.text("/Volumes/catalog/schema/volume/file.txt")

Example: Replace SparkContext and caching

Python
from pyspark.sql.functions import broadcast

# sc.broadcast → broadcast join
result = main_df.join(broadcast(lookup_df), "key")

# sc.accumulator → DataFrame aggregation
total = df.agg(F.sum("amount")).collect()[0][0]

# sqlContext.sql → spark.sql
result = spark.sql("SELECT * FROM main.db.table")

# df.cache() → remove caching calls
# Materialize expensive intermediate results to Delta as a workaround:
df = spark.read.parquet(path)
result = df.filter("status = 'active'")
expensive_df.write.format("delta").mode("overwrite").saveAsTable("main.scratch.temp")
result = spark.table("main.scratch.temp")

Libraries and environments

You can manage libraries and environments at the workspace level using base environments and at the notebook level using the notebook's serverless environment.

Classic pattern	Serverless replacement	Details
Init scripts	Serverless environments	Configure the serverless environment
Cluster-scoped libraries	Notebook-scoped or environment libraries	Configure the serverless environment
Maven/JAR libraries	JAR task support for jobs; PyPI for notebooks	JAR task for jobs
Docker containers	Serverless environments for library needs	Configure the serverless environment

Pin Python packages in requirements.txt for reproducible environments. See Specify Python package versions.

Streaming

Streaming workloads are supported on serverless, but certain triggers are not supported. Update your code to use the supported triggers.

Spark trigger	Supported	Notes
`Trigger.AvailableNow()`	Yes	Recommended
`Trigger.Once()`	Yes	This is deprecated. Use `Trigger.AvailableNow()` instead.
`Trigger.ProcessingTime(interval)`	No	Returns `INFINITE_STREAMING_TRIGGER_NOT_SUPPORTED`
`Trigger.Continuous(interval)`	No	Use Lakeflow Spark Declarative Pipelines continuous mode instead
Default (not setting `.trigger()`)	No	Omitting `.trigger()` defaults to `ProcessingTime("0 seconds")`, which is not supported on serverless. Always set `.trigger(availableNow=True)` explicitly.

For continuous streaming, migrate to Spark Declarative Pipelines in continuous mode or use continuous-schedule jobs with AvailableNow. For large sources, set maxFilesPerTrigger or maxBytesPerTrigger to prevent out-of-memory errors.

Example: Fix streaming triggers

Python
# Classic (not supported on serverless — default trigger is ProcessingTime)
query = df.writeStream.format("delta").outputMode("append").start()

# Serverless (explicit AvailableNow trigger)
query = (df.writeStream.format("delta").outputMode("append")
    .trigger(availableNow=True)
    .option("checkpointLocation", checkpoint_path)
    .start(output_path))
query.awaitTermination()

# With OOM prevention for large sources
query = (spark.readStream.format("delta")
    .option("maxFilesPerTrigger", 100)
    .option("maxBytesPerTrigger", "10g")
    .load(input_path)
    .writeStream.format("delta")
    .trigger(availableNow=True)
    .option("checkpointLocation", checkpoint_path)
    .start(output_path))

Test your workloads

Quick compatibility test: Run the workload on classic compute with Standard access mode and Databricks Runtime 14.3 or above. If the run succeeds, the workload can migrate to serverless without any code changes.
A/B comparison (recommended for production): Run the same workload on classic (control) and serverless (experiment). Diff output tables and verify correctness. Iterate until outputs match.
Temporary configs: You can temporarily set supported Spark configs during testing. Remove them once stable.

Choose a performance mode

Serverless jobs and pipelines support two performance modes: standard and performance-optimized. The performance mode you choose depends on your workload requirements.

Mode	Availability	Startup	Best for
Standard	Jobs, Lakeflow Spark Declarative Pipelines	4-6 minutes	Cost-sensitive batch
Performance-optimized	Notebooks, Jobs, Lakeflow Spark Declarative Pipelines	Seconds	Interactive, latency-sensitive

Migrate in phases

New workloads: Start all new notebooks and jobs on serverless.
Low-risk workloads: Migrate PySpark/SQL workloads already on standard access mode and Databricks Runtime 14.3 or above.
Complex workloads: Migrate workloads needing code changes (RDD rewrites, DBFS updates, trigger fixes).
Remaining workloads: Review periodically as capabilities expand.

Monitor costs

Serverless billing is based on DBU consumption, not cluster uptime. Validate cost expectations with representative workloads before migrating at scale. For tools and strategies to monitor serverless costs, see Monitor the cost of serverless compute.

Additional resources

Best practices for serverless compute: Optimization tips for serverless workloads
Serverless compute limitations: Full list of current limitations and unsupported features
Configure the serverless environment: Manage libraries and dependencies
Supported Spark configurations: Spark configs available on serverless
Spark Connect vs. classic Spark: Behavioral differences in serverless architecture
Serverless network security: NCCs, Private Link, and firewall configuration
Serverless compute release notes: Track new capabilities as they ship
Unity Catalog upgrade guide: Migrate from Hive Metastore to Unity Catalog

You can also refer to the following blog posts for more information:

What is serverless computing?: Overview of serverless capabilities and customer results
Evolution of data engineering: How serverless compute is transforming notebooks and Lakeflow jobs: How serverless powers Lakeflow Jobs and Pipelines

Migration steps​

Before you begin​

Update your code​

Data access​

APIs and code​

Libraries and environments​

Streaming​

Test your workloads​

Choose a performance mode​

Migrate in phases​

Monitor costs​

Additional resources​