This article covers Databricks Connect for Databricks Runtime 13.1 and above.
This article describes how to execute UDFs with Databricks Connect for Python. Databricks Connect enables you to connect popular IDEs, notebook servers, and custom applications to Databricks clusters. For the Scala version of this article, see User-defined functions in Databricks Connect for Scala.
Before you begin to use Databricks Connect, you must set up the Databricks Connect client.
User-defined functions in Python work out of the box and require no additional configuration. The function is serialized by Databricks Connect and sent to the server as part of the Connect request.
Since the user-defined function is serialized and deserialized, the Python version used by the client must match the Python version on the Databricks cluster. To check the cluster’s Python version, see the “System Environment” section for the cluster’s Databricks Runtime version in Databricks Runtime release notes versions and compatibility.
The following Python program sets up a simple UDF that squares values in a column.
from pyspark.sql.functions import col, udf from pyspark.sql.types import IntegerType from session import get_session @udf(returnType=IntegerType()) def double(x): return x * x spark = DatabricksSession.builder.getOrCreate() df = spark.range(1, 2) df = df.withColumn("doubled", double(col("id"))) df.show()