User Defined Functions - Python

This notebook contains an examples of creating a UDF in Python and registering it for use in Spark SQL.

Register the function as a UDF

def squared(s):
  return s * s
sqlContext.udf.register("squaredWithPython", squared)

Optionally, you can also explicitly set the return type of your UDF.

from pyspark.sql.types import LongType
def squared_typed(s):
  return s * s
sqlContext.udf.register("squaredWithPython", squared, LongType())

Call the UDF in Spark SQL

sqlContext.range(1, 20).registerTempTable("test")
%sql select id, squaredWithPython(id) as id_squared from test

Use UDF with DataFrames

%sql select * FROM test
from pyspark.sql.functions import udf
squared_udf = udf(squared, LongType())
df = sqlContext.table("test")
display(df.select("id", squared_udf("id").alias("id_squared")))