User-defined functions (UDFs) in Unity Catalog

Preview

User-defined functions (UDFs) in Unity Catalog extend SQL and Python's capabilities within Databricks. They allow custom functions to be defined, used, and securely shared and governed across computing environments.

Python UDFs registered as functions in Unity Catalog differ in scope and support from PySpark UDFs scoped to a notebook or SparkSession. See User-defined scalar functions - Python.

See CREATE FUNCTION (SQL and Python) for complete SQL language reference.

Requirements

To use UDFs in Unity Catalog, the following requirements must be met:

To use Python code in UDFs registered in Unity Catalog, you must use a serverless or pro SQL warehouse or a cluster running Databricks Runtime 13.3 LTS or above.
If a view includes a UC Python UDF, it will fail on SQL Classic Warehouses.

Creating UDFs in Unity Catalog

To create a UDF in Unity Catalog, users need USAGE and CREATE permission on the schema and USAGE permission on the catalog. See Unity Catalog for more details.

To run a UDF, users need EXECUTE permission on the UDF. Users also need USAGE permission on the schema and catalog.

The following example registers a new function to the my_schema Unity Catalog schema:

SQL
CREATE OR REPLACE FUNCTION my_catalog.my_schema.calculate_bmi(weight DOUBLE, height DOUBLE)
RETURNS DOUBLE
LANGUAGE SQL
RETURN
SELECT weight / (height * height);

Python UDFs for Unity Catalog use statements offset by double dollar signs ($$). You must specify a data type mapping. The following example registers a UDF that calculates body mass index:

SQL
CREATE OR REPLACE FUNCTION my_catalog.my_schema.calculate_bmi(weight_kg DOUBLE, height_m DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON
AS $$
return weight_kg / (height_m ** 2)
$$;

You can now use this Unity Catalog function in your SQL queries or PySpark code:

SQL
SELECT person_id, my_catalog.my_schema.calculate_bmi(weight_kg, height_m) AS bmi
FROM person_data;

Extend UDFs using custom dependencies

Preview

This feature is in Public Preview.

Extend the functionality of Unity Catalog Python UDFs beyond the Databricks Runtime environment by defining custom dependencies for external libraries.

Install dependencies from the following sources:

PyPi packages
Files stored in Unity Catalog volumes The user invoking the UDF must have READ VOLUME permissions on the source volume.
Files available at public URLs Your workspace's network security rules must allow access to public URLs.

note

To configure network security rules to allow access to public URLS from a serverless SQL warehouse, see Validate with Databricks SQL.

Serverless SQL Warehouses require the Public Preview feature Enable networking for UDFs in Serverless SQL Warehouses to be enabled to access the internet for custom dependencies.

Custom dependencies for Unity Catalog UDFs are supported on the following compute types:

Serverless notebooks and jobs
All-purpose compute using Databricks Runtime version 16.2 and above
SQL warehouse classic or pro

Use the ENVIRONMENT section of the UDF definition to specify dependencies:

SQL
CREATE OR REPLACE FUNCTION my_catalog.my_schema.mixed_process(data STRING)
RETURNS STRING
LANGUAGE PYTHON
ENVIRONMENT (
  dependencies = '["simplejson==3.19.3", "/Volumes/my_catalog/my_schema/my_volume/packages/custom_package-1.0.0.whl", "https://my-bucket.s3.amazonaws.com/packages/special_package-2.0.0.whl?Expires=2043167927&Signature=abcd"]',
  environment_version = 'None'
)
AS $$
import simplejson as json
import custom_package
return json.dumps(custom_package.process(data))
$$;

The ENVIRONMENT section contains the following fields:

Field	Description	Type	Example usage
`dependencies`	`STRING`	A list of comma-separated dependencies to install. Each entry is a string that conforms to the pip Requirements File Format.	`dependencies = '["simplejson==3.19.3", "/Volumes/catalog/schema/volume/packages/my_package-1.0.0.whl"]'` `dependencies = '["https://my-bucket.s3.amazonaws.com/packages/my_package-2.0.0.whl?Expires=2043167927&Signature=abcd"]'`
`environment_version`	`STRING`	Specifies the serverless environment version in which to run the UDF. Currently, only the value `None` is supported.	`environment_version = 'None'`

Field

Description

Type

Example usage

dependencies

STRING

A list of comma-separated dependencies to install. Each entry is a string that conforms to the pip Requirements File Format.

dependencies = '["simplejson==3.19.3", "/Volumes/catalog/schema/volume/packages/my_package-1.0.0.whl"]'

dependencies = '["https://my-bucket.s3.amazonaws.com/packages/my_package-2.0.0.whl?Expires=2043167927&Signature=abcd"]'

environment_version

STRING

Specifies the serverless environment version in which to run the UDF.

Currently, only the value None is supported.

environment_version = 'None'

Using the Unity Catalog UDF in PySpark

Python
from pyspark.sql.functions import expr

result = df.withColumn("bmi", expr("my_catalog.my_schema.calculate_bmi(weight_kg, height_m)"))
display(result)

Upgrade a session-scoped UDF

note

Syntax and semantics for Python UDFs in Unity Catalog differ from Python UDFs registered to the SparkSession. See user-defined scalar functions - Python.

Given the following session-based UDF in a Databricks notebook:

Python
from pyspark.sql.functions import udf
from pyspark.sql.types import StringType

@udf(StringType())
def greet(name):
    return f"Hello, {name}!"

# Using the session-based UDF
result = df.withColumn("greeting", greet("name"))
result.show()

To register this as a Unity Catalog function, use a SQL CREATE FUNCTION statement, as in the following example:

SQL
CREATE OR REPLACE FUNCTION my_catalog.my_schema.greet(name STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
return f"Hello, {name}!"
$$

Permissions for UDFs are managed based on the access controls applied to the catalog, schema, or database where the UDF is registered. See Unity Catalog for more information.

Use the Databricks SQL or the Databricks workspace UI to give permissions to a user or group (recommended).

Permissions in the workspace UI

Find the catalog and schema where your UDF is stored and select the UDF.
Look for a Permissions option in the UDF settings. Add users or groups and specify the type of access they should have, such as EXECUTE or MANAGE.

Permissions in Workspace UI

Permissions using Databricks SQL

The following example grants a user the EXECUTE permission on a function:

SQL
GRANT EXECUTE ON FUNCTION my_catalog.my_schema.calculate_bmi TO `user@example.com`;

To remove permissions, use the REVOKE command as in the following example:

SQL
REVOKE EXECUTE ON FUNCTION my_catalog.my_schema.calculate_bmi FROM `user@example.com`;

Best Practices for UDFs

For UDFs to be accessible to all users, Databricks recommends creating a dedicated catalog and schema with appropriate access controls.

For team-specific UDFs, use a dedicated schema within the team's catalog for storage and management.

Databricks recommends you include the following information in the UDF's docstring:

The current version number
A changelog to track modifications across versions
The UDF's purpose, parameters, and return value
An example of how to use the UDF

Here is an example of a UDF following best practices:

SQL
CREATE OR REPLACE FUNCTION my_catalog.my_schema.calculate_bmi(weight_kg DOUBLE, height_m DOUBLE)
RETURNS DOUBLE
COMMENT "Calculates Body Mass Index (BMI) from weight and height."
LANGUAGE PYTHON
AS $$
 """
Parameters:
calculate_bmi (version 1.2):
- weight_kg (float): Weight of the individual in kilograms.
- height_m (float): Height of the individual in meters.

Returns:
- float: The calculated BMI.

Example Usage:

SELECT calculate_bmi(weight, height) AS bmi FROM person_data;

Change Log:
- 1.0: Initial version.
- 1.1: Improved error handling for zero or negative height values.
- 1.2: Optimized calculation for performance.

 Note: BMI is calculated as weight in kilograms divided by the square of height in meters.
 """
if height_m <= 0:
 return None  # Avoid division by zero and ensure height is positive
return weight_kg / (height_m ** 2)
$$;

UDFs for AI agent tools

Generative AI agents can use Unity Catalog UDFs as tools to perform tasks and execute custom logic.

See Create custom AI agent tools with Unity Catalog functions.

UDFs for accessing external APIs

You can use UDFs to access external APIs from SQL. The following example uses the Python requests library to make an HTTP request.

note

Python UDFs allow TCP/UDP network traffic over ports 80, 443, and 53 using serverless compute or compute configured with standard access mode.

SQL
CREATE FUNCTION my_catalog.my_schema.get_food_calories(food_name STRING)
RETURNS DOUBLE
LANGUAGE PYTHON
AS $$
import requests

api_url = f"https://example-food-api.com/nutrition?food={food_name}"
response = requests.get(api_url)

if response.status_code == 200:
   data = response.json()
   # Assuming the API returns a JSON object with a 'calories' field
   calories = data.get('calories', 0)
   return calories
else:
   return None  # API request failed

$$;

UDFs for security and compliance

Use Python UDFs to implement custom tokenization, data masking, data redaction, or encryption mechanisms.

The following example masks the identity of an email address while maintaining length and domain:

SQL
CREATE OR REPLACE FUNCTION my_catalog.my_schema.mask_email(email STRING)
RETURNS STRING
LANGUAGE PYTHON
AS $$
parts = email.split('@', 1)
if len(parts) == 2:
  username, domain = parts
else:
  return None
masked_username = username[0] + '*' * (len(username) - 2) + username[-1]
return f"{masked_username}@{domain}"
$$

The following example applies this UDF in a dynamic view definition:

SQL
-- First, create the view
CREATE OR REPLACE VIEW my_catalog.my_schema.masked_customer_view AS
SELECT
  id,
  name,
  my_catalog.my_schema.mask_email(email) AS masked_email
FROM my_catalog.my_schema.customer_data;

-- Now you can query the view
SELECT * FROM my_catalog.my_schema.masked_customer_view;

+---+------------+------------------------+------------------------+
| id|        name|                   email|           masked_email |
+---+------------+------------------------+------------------------+
|  1|    John Doe|   john.doe@example.com |  j*******e@example.com |
|  2| Alice Smith|alice.smith@company.com |a**********h@company.com|
|  3|   Bob Jones|    bob.jones@email.org |   b********s@email.org |
+---+------------+------------------------+------------------------+

Limitations

You can define any number of Python functions within a Python UDF, but all must return a scalar value.
Python functions must handle NULL values independently, and all type mappings must follow Databricks SQL language mappings.
If no catalog or schema is specified, Python UDFs are registered to the current active schema.
Python UDFs execute in a secure, isolated environment and do not have access to file systems or internal services.
You cannot call more than five UDFs per query.

Requirements​

Creating UDFs in Unity Catalog​

Extend UDFs using custom dependencies​

Using the Unity Catalog UDF in PySpark​

Upgrade a session-scoped UDF​

Share UDFs in Unity Catalog​

Permissions in the workspace UI​

Permissions using Databricks SQL​

Best Practices for UDFs​

UDFs for AI agent tools​

UDFs for accessing external APIs​

UDFs for security and compliance​

Limitations​

Requirements

Creating UDFs in Unity Catalog

Extend UDFs using custom dependencies

Using the Unity Catalog UDF in PySpark

Upgrade a session-scoped UDF

Share UDFs in Unity Catalog

Permissions in the workspace UI

Permissions using Databricks SQL

Best Practices for UDFs

UDFs for AI agent tools

UDFs for accessing external APIs

UDFs for security and compliance

Limitations