Serverless compute release notes

Preview

This feature is in Public Preview. For information on eligibility and enablement, see Enable serverless compute public preview.

This article explains the features and behaviors that are currently available and upcoming on serverless compute for notebooks and workflows.

Databricks periodically releases updates to serverless compute. All users get the same updates, rolled out over a short period of time. See How are releases rolled out?.

Upcoming behavioral changes

This section will highlight behavioral changes coming in the next serverless compute version. When the changes are pushed to production they will be added to the release notes.

Release notes

This section includes release notes for serverless compute. Release notes are organized by year and week of year. Serverless compute always runs using the most recently released version listed here.

Serverless compute version 2024.15

April 15, 2024

Serverless compute version 2024.15 has been released into production. This is the initial serverless compute version which roughly corresponds to Databricks Runtime 14.3 with some modifications that remove support for some non-serverless and legacy features.

This version includes the following updates:

Spark configurations not supported

In an effort to automate Spark configurations on serverless compute, Databricks has removed support for most Spark configurations. You can only set spark.sql.legacy.timeParserPolicy and spark.sql.session.timeZone in your Spark configurations.

If you try any other Spark configuration, it will not run.

Caching API and SQL commands not supported

Usage of Dataframe and SQL cache APIs is not supported. Using any of these APIs or SQL commands will result in an exception.

Unsupported APIs:

Unsupported SQL commands:

Global temporary views not supported

The creation of global temporary views is not supported. Using either of these commands will result in an exception:

Instead, Databricks recommends using session temporary views or creating tables where cross-session data passing is required.

CREATE FUNCTION (External) not supported

The CREATE FUNCTION (External) command is not supported. Using this command will result in an exception.

Instead, Databricks recommends using CREATE FUNCTION (SQL and Python) to create UDFs.

Hive SerDe tables not supported

Hive SerDe tables are not supported. Additionally, the corresponding LOAD DATA command which loads data into a Hive SerDe table is not supported. Using the command will result in an exception.

Support for data sources is limited to AVRO, BINARYFILE, CSV, DELTA, JSON, KAFKA, ORC, PARQUET, ORC, TEXT, and XML.

Hive variables not supported

Hive variables (for example ${env:var}, ${configName}, ${system:var}, and spark.sql.variable) or config variable references using the ${var} syntax are not supported. Using Hive variables will result in an exception.

Instead, use DECLARE VARIABLE, SET VARIABLE, and SQL session variable references and parameter markers (‘?’, or ‘:var’) to declare, modify, and reference session state. You can also use the IDENTIFIER clause to parameterize object names in many cases.

input_file functions are deprecated

The input_file_name(), input_file_block_length(), and input_file_block_start() functions have been deprecated. Using these functions is highly discouraged.

Instead, use the file metadata column to retrieve file metadata information.

Behavioral changes

Serverless compute version 2024.15 includes the following behavioral changes:

  • unhex(hexStr) bug fix: When using the unhex(hexStr) function, hexStr is always padded left to a whole byte. Previously the unhex function ignored the first half-byte. For example: unhex('ABC') now produces x'0ABC' instead of x'BC'.

  • Auto-generated column aliases are now stable: When the result of an expression is referenced without a user-specified column alias, this auto-generated alias will now be stable. The new algorithm may result in a change to the previously auto-generated names used in features like materialized views.

  • Table scans with CHAR type fields are now always padded: Delta tables, certain JDBC tables, and external data sources store CHAR data in non-padded form. When reading, Databricks will now pad the data with spaces to the declared length to ensure correct semantics.

  • Casts from BIGINT/DECIMAL to TIMESTAMP throw an exception for overflowed values: Databricks allows casting from BIGINT and DECIMAL to TIMESTAMP by treating the value as the number of seconds from the Unix epoch. Previously, Databricks would return overflowed values but now throws an exception in cases of overflow. Use try_cast to return NULL instead of an exception.

  • PySpark UDF execution has been improved to match the exact behavior of UDF execution on single-user compute: The following changes have been made:

    • UDFs with a string return type no longer implicitly convert non-string values into strings. Previously, UDFs with a return type of str would apply a str(..) wrapper to the result regardless of the actual data type of the returned value.

    • UDFs with timestamp return types no longer implicitly apply a timezone conversion to timestamps.

System environment

Serverless compute includes the following system environment:

  • Operating System: Ubuntu 22.04.3 LTS

  • Python: 3.10.12

  • Delta Lake: 3.1.0

Installed Python libraries

Library

Version

Library

Version

Library

Version

anyio

3.5.0

argon2-cffi

21.3.0

argon2-cffi-bindings

21.2.0

asttokens

2.0.5

astunparse

1.6.3

attrs

22.1.0

backcall

0.2.0

beautifulsoup4

4.11.1

black

22.6.0

bleach

4.1.0

blinker

1.4

boto3

1.24.28

botocore

1.27.96

cachetools

5.3.2

certifi

2022.12.7

cffi

1.15.1

chardet

4.0.0

charset-normalizer

2.0.4

click

8.0.4

comm

0.1.2

contourpy

1.0.5

cryptography

39.0.1

cycler

0.11.0

Cython

0.29.32

databricks-connect

14.3.1

databricks-sdk

0.20.0

dbus-python

1.2.18

debugpy

1.6.7

decorator

5.1.1

defusedxml

0.7.1

distlib

0.3.8

docstring-to-markdown

0.11

entrypoints

0.4

executing

0.8.3

facets-overview

1.1.1

fastjsonschema

2.19.1

filelock

3.13.1

fonttools

4.25.0

google-auth

2.28.1

googleapis-common-protos

1.62.0

grpcio

1.62.0

grpcio-status

1.62.0

httplib2

0.20.2

idna

3.4

importlib-metadata

4.6.4

ipyflow-core

0.0.198

ipykernel

6.25.0

ipython

8.14.0

ipython-genutils

0.2.0

ipywidgets

7.7.2

jedi

0.18.1

jeepney

0.7.1

Jinja2

3.1.2

jmespath

0.10.0

joblib

1.2.0

jsonschema

4.17.3

jupyter-client

7.3.4

jupyter-server

1.23.4

jupyter_core

5.2.0

jupyterlab-pygments

0.1.2

jupyterlab-widgets

1.0.0

keyring

23.5.0

kiwisolver

1.4.4

launchpadlib

1.10.16

lazr.restfulclient

0.14.4

lazr.uri

1.0.6

lxml

4.9.1

MarkupSafe

2.1.1

matplotlib

3.7.0

matplotlib-inline

0.1.6

mccabe

0.7.0

mistune

0.8.4

more-itertools

8.10.0

mypy-extensions

0.4.3

nbclassic

0.5.2

nbclient

0.5.13

nbconvert

6.5.4

nbformat

5.7.0

nest-asyncio

1.5.6

nodeenv

1.8.0

notebook

6.5.2

notebook_shim

0.2.2

numpy

1.23.5

oauthlib

3.2.0

packaging

23.2

pandas

1.5.3

pandocfilters

1.5.0

parso

0.8.3

pathspec

0.10.3

patsy

0.5.3

pexpect

4.8.0

pickleshare

0.7.5

Pillow

9.4.0

pip

22.3.1

platformdirs

2.5.2

plotly

5.9.0

pluggy

1.0.0

prometheus-client

0.14.1

prompt-toolkit

3.0.36

protobuf

4.25.3

psutil

5.9.0

psycopg2

2.9.3

ptyprocess

0.7.0

pure-eval

0.2.2

py4j

0.10.9.7

pyarrow

8.0.0

pyarrow-hotfix

0.5

pyasn1

0.5.1

pyasn1-modules

0.3.0

pyccolo

0.0.52

pycparser

2.21

pydantic

1.10.6

pyflakes

3.1.0

Pygments

2.11.2

PyGObject

3.42.1

PyJWT

2.3.0

pyodbc

4.0.32

pyparsing

3.0.9

pyright

1.1.294

pyrsistent

0.18.0

python-dateutil

2.8.2

python-lsp-jsonrpc

1.1.1

python-lsp-server

1.8.0

pytoolconfig

1.2.5

pytz

2022.7

pyzmq

23.2.0

requests

2.28.1

rope

1.7.0

rsa

4.9

s3transfer

0.6.2

scikit-learn

1.1.1

scipy

1.10.0

seaborn

0.12.2

SecretStorage

3.3.1

Send2Trash

1.8.0

setuptools

65.6.3

six

1.16.0

sniffio

1.2.0

soupsieve

2.3.2.post1

ssh-import-id

5.11

stack-data

0.2.0

statsmodels

0.13.5

tenacity

8.1.0

terminado

0.17.1

threadpoolctl

2.2.0

tinycss2

1.2.1

tokenize-rt

4.2.1

tomli

2.0.1

tornado

6.1

traitlets

5.7.1

typing_extensions

4.4.0

ujson

5.4.0

unattended-upgrades

0.1

urllib3

1.26.14

virtualenv

20.16.7

wadllib

1.3.6

wcwidth

0.2.5

webencodings

0.5.1

websocket-client

0.58.0

whatthepatch

1.0.2

wheel

0.38.4

widgetsnbextension

3.6.1

yapf

0.33.0

Zipp

1.0.0

Limitations

Serverless compute is based on the shared compute architecture. The most important limitations inherited from shared compute are listed below, along with additional serverless-specific limitations. For a full list of shared compute limitations, see Compute access mode limitations for Unity Catalog.

General limitations

  • Scala and R are not supported.

  • Only ANSI SQL is supported when writing SQL.

  • Spark RDD APIs are not supported.

  • Spark Context (sc), spark.sparkContext, and sqlContext are not supported.

  • You cannot access DBFS.

  • Databricks Container Services are not supported.

  • The web terminal is not supported.

  • Structured Streaming queries require the invocation of query.awaitTermination() to ensure the query is completed.

  • No query can run longer than 48 hours.

  • You must use Unity Catalog to connect to external data sources. Use external locations to access cloud storage.

  • Support for data sources is limited to AVRO, BINARYFILE, CSV, DELTA, JSON, KAFKA, ORC, PARQUET, ORC, TEXT, and XML.

  • User-defined functions (UDFs) cannot access the internet.

  • Individual rows must not exceed the maximum size of 128MB.

  • The Spark UI is not available. Instead, use the query profile to view information about your Spark queries. See Query profile.

Machine learning limitations

Notebooks limitations

  • Notebooks have access to 8GB memory which cannot be configured.

  • Notebook-scoped libraries are not cached across development sessions.

  • Sharing TEMP tables and views when sharing a notebook among users is not supported.

  • Autocomplete and Variable Explorer for dataframes in notebooks are not supported.

Workflow limitations

  • The driver size for serverless compute for workflows is currently fixed and cannot be changed.

  • Task logs are not isolated per task run. Logs will contain the output from multiple tasks.

  • Task libraries are not supported for notebook tasks. Use notebook-scoped libraries instead. See Notebook-scoped Python libraries.

Compute-specific limitations

The following compute-specific features are not supported:

  • Compute policies

  • Compute-scoped init scripts

  • Compute-scoped libraries, including custom data sources and Spark extensions. Use notebook-scoped libraries instead.

  • Compute-level data access configurations. As a consequence, accessing tables and files via HMS on cloud paths, or with DBFS mounts that have no embedded credentials, will not work.

  • Instance pools

  • Compute event logs

  • Apache Spark compute configs and environment variables

Frequently asked questions (FAQ)

How are releases rolled out?

Serverless compute is a versionless product, which means that Databricks automatically upgrades the serverless compute runtime to support enhancements and upgrades to the platform. All users get the same updates, rolled out over a short period of time.

How do I determine which version I am running?

Your serverless workloads will always be running on the latest runtime version. See Release notes for the most recent version.