User-defined functions (UDFs) in Unity Catalog

Important

This feature is in Public Preview and requires Databricks Runtime 13.3 LTS and above.

Databricks provides a SQL-native syntax to register custom functions to schemas governed by Unity Catalog. Standard CREATE FUNCTION SQL syntax is supported for all Unity Catalog compute on Databricks. On pro or serverless SQL warehouses and in Databricks Runtime 13.3 LTS and above, you can include Python code in functions registered to Unity Catalog.

Graviton instances do not support UDFs on Unity Catalog-enabled clusters.

Note

Views registered from queries that use UDFs must use supported compute. Classic SQL warehouses and Databricks Runtime 12.2 LTS and below cannot resolve views that contain UDFs registered to Unity Catalog.

For the complete SQL language reference, see CREATE FUNCTION (SQL and Python).

For information about how Unity Catalog manages permissions on functions, see CREATE FUNCTION.

Important

Python UDFs registered as functions in Unity Catalog differ in scope and support from PySpark UDFs scoped to a notebook or SparkSession. See User-defined scalar functions - Python.

Important

UDFs and UDAFs are not supported on Graviton clusters configured with shared access mode and Unity Catalog.

Custom SQL functions in Unity Catalog

When you create a SQL function using compute configured for Unity Catalog, the function is registered to the currently active schema by default. The following example demonstrates the syntax you might use to declare a target catalog and schema for a new function:

CREATE FUNCTION target_catalog.target_schema.roll_dice()
    RETURNS INT
    LANGUAGE SQL
    NOT DETERMINISTIC
    CONTAINS SQL
    COMMENT 'Roll a single 6 sided die'
    RETURN (rand() * 6)::INT + 1;

All users with sufficient privileges on the function can then use the function in compute environments configured for Unity Catalog, as in the following example:

SELECT target_catalog.target_schema.roll_dice()

Note

You can use UDFs using LANGUAGE SQL to return tables or scalar values.

Register a Python UDF to Unity Catalog

In Databricks Runtime 13.3 LTS and above, you can use the SQL CREATE FUNCTION statement to register scalar Python UDFs to Unity Catalog.

Important

Only pro and serverless SQL warehouses support Python UDFs for Unity Catalog.

Python UDFs are designed to provide the full expressiveness of Python directly within SQL functions, allowing for customized operations such as advanced transformations, data masking, and hashing.

Python UDFs execute in a secure, isolated environment and do not have access to file systems or internal services.

Python UDFs running on serverless compute or in shared access mode allow TCP/UDP network traffic over ports 80, 443, and 53.

See Which UDFs are most efficient?.

Note

Syntax and semantics for Python UDFs in Unity Catalog differ from Python UDFs registered to the SparkSession. See User-defined scalar functions - Python.

Python UDFs for Unity Catalog use statements set off by double dollar signs ($$), as in the following code example:

CREATE FUNCTION target_catalog.target_schema.greet(s STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
  return f"Hello, {s}"
$$

The following example demonstrates using this function to return greeting statements for all names stored in the first_name column of a table named students:

SELECT target_catalog.target_schema.greet(first_name)
FROM students;

You can define any number of Python functions within a Python UDF, but must return a scalar value.

Python functions must handle NULL values independently, and all type mappings must follow Databricks SQL language mappings.

You can import standard Python libraries included by Databricks, but you cannot include custom libraries or external dependencies.

If no catalog or schema is specified, Python UDFs are registered to the current active schema.

The following example imports a library and uses multiple functions within a Python UDF:

CREATE FUNCTION roll_dice(num_dice INTEGER, num_sides INTEGER)
RETURNS INTEGER
LANGUAGE PYTHON
AS $$
  import numpy as np

  def roll_die(num_sides):
    return np.random.randint(num_sides) + 1

  def sum_dice(num_dice,num_sides):
    return sum([roll_die(num_sides) for x in range(num_dice)])

  return sum_dice(num_dice, num_sides)
$$