Migrate from classic compute to serverless compute
Migrate your workloads from classic compute to serverless compute. Serverless compute handles provisioning, scaling, runtime upgrades, and optimization automatically.
Most classic workloads can migrate with minimal or no code changes. This page focuses on those workloads. Some features, such as df.cache, are not yet supported on serverless, but will not require code changes once available. Certain workloads that depend on R or Scala notebooks require classic compute and will not be able to migrate to serverless. For a full list of current limitations, see Serverless compute limitations.
Migration steps
To migrate your workloads from classic compute to serverless compute, follow these steps:
- Check prerequisites: Verify that your workspace, networking, and cloud storage access meet the requirements. See Before you begin.
- Update code: Make any necessary code and configuration changes. See Update your code.
- Test your workloads: Validate compatibility and correctness before cutting over. See Test your workloads.
- Choose a performance mode: Select the performance mode that best matches your workload requirements. See Choose a performance mode.
- Migrate in phases: Roll out serverless incrementally, starting with new and low-risk workloads. See Migrate in phases.
- Monitor costs: Track serverless DBU consumption and set up alerts. See Monitor costs.
Before you begin
Before you begin migrating, you might need to update some legacy configurations in your workspace.
Prerequisite | Action | Details |
|---|---|---|
Workspace is enabled for Unity Catalog | Migrate from Hive Metastore if needed | |
Networking configured | Replace VPC peering with NCCs, Private Link, or firewall rules | |
Cloud storage access | Replace legacy data access patterns with Unity Catalog external locations |
Replace DBFS mounts that use instance profiles with Unity Catalog external locations.
Update your code
The following sections list the code and configuration changes required to make your workloads compatible with serverless.
Data access
Legacy data access patterns are not supported on serverless. Update your code to use Unity Catalog instead.
Classic pattern | Serverless replacement | Details |
|---|---|---|
DBFS paths ( | Unity Catalog volumes | |
Hive Metastore tables | Unity Catalog tables (or HMS Federation) | |
IAM instance profiles | Unity Catalog external locations | |
Custom JDBC JARs | Lakehouse Federation |
DBFS access is limited on serverless. Update all dbfs:/ paths to Unity Catalog volumes before migrating. For more information, see Migrate files stored in DBFS.
Example: Replace DBFS paths and Hive Metastore references
# Classic
df = spark.read.csv("dbfs:/mnt/datalake/data.csv", header=True)
df.write.parquet("dbfs:/mnt/output/results")
df = spark.table("my_database.my_table")
# Serverless
df = spark.read.csv("/Volumes/main/sales/raw_data/data.csv", header=True)
df.write.parquet("/Volumes/main/analytics/output/results")
df = spark.table("main.my_database.my_table") # three-level namespace
APIs and code
Certain APIs and code patterns are not supported on serverless. Reference this table to see if your code needs to be updated.
Classic pattern | Serverless replacement | Details |
|---|---|---|
RDD APIs ( | DataFrame APIs | |
| Remove caching calls | |
| Use | |
Hive variables ( | SQL | |
Unsupported Spark configs | Remove unsupported configs. Serverless auto-tunes most settings. | Configure Spark properties for serverless notebooks and jobs |
Example: Replace RDD operations with DataFrames
from pyspark.sql import functions as F
# sc.parallelize + rdd.map
# Classic: rdd = sc.parallelize([1, 2, 3]); rdd.map(lambda x: x * 2).collect()
df = spark.createDataFrame([(1,), (2,), (3,)], ["value"])
result = df.select((F.col("value") * 2).alias("value")).collect()
# rdd.flatMap
# Classic: sc.parallelize(["hello world"]).flatMap(lambda l: l.split(" ")).collect()
df = spark.createDataFrame([("hello world",)], ["line"])
words = df.select(F.explode(F.split("line", " ")).alias("word")).collect()
# rdd.groupByKey
# Classic: rdd.groupByKey().mapValues(list).collect()
df = spark.createDataFrame([("a", 1), ("b", 2), ("a", 3)], ["key", "value"])
grouped = df.groupBy("key").agg(F.collect_list("value").alias("values")).collect()
# rdd.mapPartitions → applyInPandas
import pandas as pd
def process_group(pdf: pd.DataFrame) -> pd.DataFrame:
return pd.DataFrame({"total": [pdf["id"].sum()]})
result = (spark.range(100).repartition(4)
.groupBy(F.spark_partition_id())
.applyInPandas(process_group, schema="total long").collect())
# sc.textFile → spark.read.text
df = spark.read.text("/Volumes/catalog/schema/volume/file.txt")
Example: Replace SparkContext and caching
from pyspark.sql.functions import broadcast
# sc.broadcast → broadcast join
result = main_df.join(broadcast(lookup_df), "key")
# sc.accumulator → DataFrame aggregation
total = df.agg(F.sum("amount")).collect()[0][0]
# sqlContext.sql → spark.sql
result = spark.sql("SELECT * FROM main.db.table")
# df.cache() → remove caching calls
# Materialize expensive intermediate results to Delta as a workaround:
df = spark.read.parquet(path)
result = df.filter("status = 'active'")
expensive_df.write.format("delta").mode("overwrite").saveAsTable("main.scratch.temp")
result = spark.table("main.scratch.temp")
Libraries and environments
You can manage libraries and environments at the workspace level using base environments and at the notebook level using the notebook's serverless environment.
Classic pattern | Serverless replacement | Details |
|---|---|---|
Init scripts | Serverless environments | |
Cluster-scoped libraries | Notebook-scoped or environment libraries | |
Maven/JAR libraries | JAR task support for jobs; PyPI for notebooks | |
Docker containers | Serverless environments for library needs |
Pin Python packages in requirements.txt for reproducible environments. See Best practices for serverless compute.
Streaming
Streaming workloads are supported on serverless, but certain triggers are not supported. Update your code to use the supported triggers.
Spark trigger | Supported | Notes |
|---|---|---|
| Yes | Recommended |
| Yes | This is deprecated. Use |
| No | Returns |
| No | Use Lakeflow Spark Declarative Pipelines continuous mode instead |
Default (not setting | No | Omitting |
For continuous streaming, migrate to Spark Declarative Pipelines in continuous mode or use continuous-schedule jobs with AvailableNow. For large sources, set maxFilesPerTrigger or maxBytesPerTrigger to prevent out-of-memory errors.
Example: Fix streaming triggers
# Classic (not supported on serverless — default trigger is ProcessingTime)
query = df.writeStream.format("delta").outputMode("append").start()
# Serverless (explicit AvailableNow trigger)
query = (df.writeStream.format("delta").outputMode("append")
.trigger(availableNow=True)
.option("checkpointLocation", checkpoint_path)
.start(output_path))
query.awaitTermination()
# With OOM prevention for large sources
query = (spark.readStream.format("delta")
.option("maxFilesPerTrigger", 100)
.option("maxBytesPerTrigger", "10g")
.load(input_path)
.writeStream.format("delta")
.trigger(availableNow=True)
.option("checkpointLocation", checkpoint_path)
.start(output_path))
Test your workloads
- Quick compatibility test: Run the workload on classic compute with Standard access mode and Databricks Runtime 14.3 or above. If the run succeeds, the workload can migrate to serverless without any code changes.
- A/B comparison (recommended for production): Run the same workload on classic (control) and serverless (experiment). Diff output tables and verify correctness. Iterate until outputs match.
- Temporary configs: You can temporarily set supported Spark configs during testing. Remove them once stable.
Choose a performance mode
Serverless jobs and pipelines support two performance modes: standard and performance-optimized. The performance mode you choose depends on your workload requirements.
Mode | Availability | Startup | Best for |
|---|---|---|---|
Standard | Jobs, Lakeflow Spark Declarative Pipelines | 4-6 minutes | Cost-sensitive batch |
Performance-optimized | Notebooks, Jobs, Lakeflow Spark Declarative Pipelines | Seconds | Interactive, latency-sensitive |
Migrate in phases
- New workloads: Start all new notebooks and jobs on serverless.
- Low-risk workloads: Migrate PySpark/SQL workloads already on standard access mode and Databricks Runtime 14.3 or above.
- Complex workloads: Migrate workloads needing code changes (RDD rewrites, DBFS updates, trigger fixes).
- Remaining workloads: Review periodically as capabilities expand.
Monitor costs
Serverless billing is based on DBU consumption, not cluster uptime. Validate cost expectations with representative workloads before migrating at scale.
- Serverless usage policies for cost attribution
- System tables for dashboards and alerts
- Account budget alerts
- Use the pre-configured usage dashboard for an overview of serverless spending.
Additional resources
- Best practices for serverless compute: Optimization tips for serverless workloads
- Serverless compute limitations: Full list of current limitations and unsupported features
- Configure the serverless environment: Manage libraries and dependencies
- Supported Spark configurations: Spark configs available on serverless
- Spark Connect vs. classic Spark: Behavioral differences in serverless architecture
- Serverless network security: NCCs, Private Link, and firewall configuration
- Serverless compute release notes: Track new capabilities as they ship
- Unity Catalog upgrade guide: Migrate from Hive Metastore to Unity Catalog
You can also refer to the following blog posts for more information:
- What is serverless computing?: Overview of serverless capabilities and customer results
- Evolution of data engineering: How serverless compute is transforming notebooks and Lakeflow jobs: How serverless powers Lakeflow Jobs and Pipelines