User-defined functions in Databricks Connect for Python

Note

This article covers Databricks Connect for Databricks Runtime 13.1 and above.

This article describes how to execute UDFs with Databricks Connect for Python. Databricks Connect enables you to connect popular IDEs, notebook servers, and custom applications to Databricks clusters. For the Scala version of this article, see User-defined functions in Databricks Connect for Scala.

Note

Before you begin to use Databricks Connect, you must set up the Databricks Connect client.

User-defined functions in Python work out of the box and require no additional configuration. The function is serialized by Databricks Connect and sent to the server as part of the Connect request.

Note

Since the user-defined function is serialized and deserialized, the Python version used by the client must match the Python version on the Databricks cluster. To check the cluster’s Python version, see the “System Environment” section for the cluster’s Databricks Runtime version in Databricks Runtime release notes versions and compatibility.

The following Python program sets up a simple UDF that squares values in a column.

from pyspark.sql.functions import col, udf
from pyspark.sql.types import IntegerType
from session import get_session

@udf(returnType=IntegerType())
def double(x):
    return x * x


spark = DatabricksSession.builder.getOrCreate()

df = spark.range(1, 2)
df = df.withColumn("doubled", double(col("id")))

df.show()