Skip to main content

User-defined operators in Lakeflow Designer

Preview

This feature is in Public Preview.

Lakeflow Designer lets you create user-defined operators that appear directly in the canvas alongside built-in operators. Use them to extend Lakeflow Designer with your own business logic, calculations, or integrations.

There are three types of user-defined operators:

  • python-run-function: A standalone YAML file with inline Python stored in the workspace. Best for DataFrame-level transformations and external integrations. Permissions are managed at the workspace file level.
  • uc-udf: Wraps a Unity Catalog scalar function. Best for column-level transformations. Access is governed by Unity Catalog permissions.
  • uc-udtf: Wraps a Unity Catalog table-valued function. Best for table-level transforms like ML clustering and aggregation. Access is governed by Unity Catalog permissions.

Feature

python-run-function

uc-udf

uc-udtf

Example use case

DataFrame transforms, API integrations, email notifications

Column-level calculations (BMI, interest rates)

ML clustering, aggregation across rows

Input

DataFrames

Single values

Entire table, row by row

Output

DataFrames

Single value

Table (multiple rows)

Requires Unity Catalog function

No

Yes

Yes

Access governance

Workspace file permissions

Unity Catalog permissions (EXECUTE, USE SCHEMA)

Unity Catalog permissions (EXECUTE, USE SCHEMA)

Supported languages

Python only

SQL or Python in a SQL wrapper

SQL or Python in a SQL wrapper

How do user-defined operators work?

A user-defined operator consists of:

  • Operator logic: The code that runs when the operator executes. This can be an inline Python run() function (for python-run-function) or a Unity Catalog function (for uc-udf and uc-udtf).
  • YAML configuration: Tells Lakeflow Designer how to present the operator in the UI, including the operator's name, description, input parameters, UI widgets, and ports. All operator types use the user-defined-operator-v0.1.0 schema.
  • Registration file: An entry in .user_defined_operators.yaml that lets Lakeflow Designer discover the operator.

Operator logic

Python run function user-defined operator logic

Every python-run-function operator must define a run() function:

Python
def run(config: Dict[str, Any], inputs: Dict[str, Any], spark) -> Dict[str, Any]:
  • config: User-configured values from the UI, keyed by property name.
  • inputs: Input DataFrames, keyed by input port name.
  • spark: The active SparkSession.
  • Returns: A dictionary mapping output port name values to DataFrames.

The following example filters rows from an input DataFrame:

Python
def run(config, inputs, spark):
df = inputs["in"]
filtered = df.filter(config["filter_expression"])
return {"out": filtered}

If your operator requires external pip packages, add the environment field to the YAML:

YAML
environment:
environment_version: '1'
dependencies:
- requests==2.31.0
- beautifulsoup4==4.12.0

UDF and UDTF operator logic

You can write UC functions in SQL or Python. Python functions are wrapped in a SQL CREATE FUNCTION statement:

SQL function:

SQL
CREATE OR REPLACE FUNCTION my_catalog.my_schema.calculate_bmi(weight_kg DOUBLE, height_m DOUBLE)
RETURNS DOUBLE
LANGUAGE SQL
RETURN
SELECT weight_kg / (height_m * height_m);

Python function (wrapped in SQL):

SQL
CREATE OR REPLACE FUNCTION my_catalog.my_schema.calculate_bmi(weight_kg DOUBLE, height_m DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON
AS $$
return weight_kg / (height_m ** 2)
$$;

UDFs process a single value at a time and return a calculated value. UDTFs process tables row by row and can maintain state across all rows. Use uc-udf for column-level transforms and uc-udtf for operations like ML clustering or aggregation.

Additionally, UDTFs require you to define three key methods: __init__(), eval(), and terminate():

Python
class MyOperator:
def __init__(self):
# Called before processing - initialize any values needed.

def eval(self, row, id_column, columns, k):
# Called once per input row - accumulate data here.

def terminate(self):
# Called after all rows - perform final calculations and yield results.
note

UDTF return tables must have fixed, explicit types. You can't reference input column types in the return configuration.

YAML configuration

The YAML configuration tells Lakeflow Designer how to present the operator in the UI. It defines the operator's name, description, input parameters, UI widgets, and ports. Each configuration field is a property with a type, title, and optional x-ui widget hints:

YAML
config:
type: object
properties:
my_param:
type: string
title: My Parameter
x-ui:
widget: input
my_expression:
type: string
title: Column
format: expression
x-ui:
widget: expression
port: in
my_number:
type: number
title: Count
default: 10
minimum: 0
maximum: 100
required:
- my_param
- my_expression

For full details on the YAML schema, including all widget types and configuration options, see User-defined operator YAML reference.

Ports

Ports define the inputs and outputs for your operator:

YAML
ports:
input:
- name: in
title: Input Data
mime: application/vnd.databricks.dataframe
required: true
allowMultiple: false
output:
- name: out
title: Output Data

YAML for Python run function operators

For python-run-function operators, the YAML file is standalone and includes a run_function field with inline Python code:

YAML
schema: user-defined-operator-v0.1.0
type: python-run-function
name: Filter Rows
id: filter_rows
version: '1.0.0'
description: Filters rows based on a SQL expression.
config:
type: object
properties:
filter_expression:
type: string
title: Filter Expression
x-ui:
widget: input
required:
- filter_expression
ports:
input:
- name: in
title: Input
output:
- name: out
title: Output
run_function:
type: inline
code: |
def run(config, inputs, spark):
df = inputs["in"]
filtered = df.filter(config["filter_expression"])
return {"out": filtered}

YAML for Unity Catalog functions

For UC-based operators, embed the YAML configuration as a comment or docstring in your function.

In SQL (use /* ... */ comment):

SQL
RETURN(/*
schema: user-defined-operator-v0.1.0
type: uc-udf
name: Calculate BMI
id: calculate_bmi
version: "1.0.0"
description: Calculates BMI from weight and height.
config:
type: object
properties:
weight_kg:
type: string
title: Weight (in kg)
format: expression
x-ui:
widget: expression
port: in
height_m:
type: string
title: Height (in meters)
format: expression
x-ui:
widget: expression
port: in
required:
- weight_kg
- height_m
ports:
input:
- name: in
title: Input Data
output:
- name: out
title: Output
*/
SELECT weight_kg / (height_m * height_m)
);

In Python (use """ ... """ docstring):

SQL
AS $$
"""
schema: user-defined-operator-v0.1.0
type: uc-udf
name: Calculate BMI
id: calculate_bmi
version: "1.0.0"
description: Calculates BMI from weight and height.
config:
type: object
properties:
weight_kg:
type: string
title: Weight (in kg)
format: expression
x-ui:
widget: expression
port: in
height_m:
type: string
title: Height (in meters)
format: expression
x-ui:
widget: expression
port: in
required:
- weight_kg
- height_m
ports:
input:
- name: in
title: Input Data
output:
- name: out
title: Output
"""

return weight_kg / (height_m ** 2)
$$;

Register and deploy your operator to Lakeflow Designer

For your operator to appear in Lakeflow Designer, register it in a .user_defined_operators.yaml file:

  • Workspace level: Place the file in the root of your workspace to make the operator visible to all users.
  • User level: Place the file in your user home folder (/Workspace/Users/<user-name>/.user_defined_operators.yaml) to make operators visible only to you.

The operators: section supports file paths, Unity Catalog function references, and glob patterns. You can mix entry types:

YAML
operators:
# File path (python-run-function operators)
- /Workspace/Users/me/udos/my_operator.yaml
# Glob pattern (registers all matching files)
- /Workspace/Users/me/udos/transforms/*.yaml
# UC function reference (uc-udf and uc-udtf operators)
- catalog: my_catalog
schema: my_schema
functionName: my_function

Advanced configurations

Preview mode

Lakeflow Designer supports previews while in design mode. For operators that call external APIs or write to external systems, add an is_preview config property so you can skip side effects during preview. When preview mode is enabled, users need to explicitly click Run to execute the operator with side effects.

YAML
config:
type: object
properties:
is_preview:
type: boolean
format: is_preview
default: false

Lakeflow Designer automatically sets this value to true during preview. Check it in your logic to skip side effects:

Python
# In a python-run-function
if config.get("is_preview"):
return {"out": inputs["in"]}

# In a UC function (SQL)
CASE WHEN is_preview THEN 'preview' ELSE /* actual work */ END

Unity Catalog connections

For UC-based SQL operators that call external APIs, use Unity Catalog HTTP connections to securely store credentials:

SQL
CREATE CONNECTION my_api_connection TYPE HTTP OPTIONS (
host 'https://api.example.com',
port '443',
base_path '/v1/',
bearer_token 'your-token-here'
);

Then use the connection in your SQL UDF with the http_request() function. For details, see Connect to external HTTP services.

WorkspaceClient

For python-run-function operators, you can use the Databricks WorkspaceClient to access workspace resources and external APIs:

Python
def run(config, inputs, spark):
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Use w to access workspace resources

Create a complete python-run-function user-defined operator

The following steps walk through creating a python-run-function operator from scratch.

Step 1: Define the logic

Write your run() function in a notebook:

Python
from typing import Dict, Any

def run(config: Dict[str, Any], inputs: Dict[str, Any], spark) -> Dict[str, Any]:
from pyspark.sql import functions as F
df = inputs["in"]
result = df.withColumn(config["column_name"], F.current_timestamp())
return {"out": result}

Step 2: Test the function

Test the function interactively with sample data:

Python
test_df = spark.createDataFrame(
[("Alice", 100), ("Bob", 200)],
["name", "amount"]
)

result = run(
config={"column_name": "processed_at"},
inputs={"in": test_df},
spark=spark
)

result["out"].show()

Step 3: Create the YAML configuration

Define the operator metadata, configuration fields, and ports in a YAML file:

YAML
schema: user-defined-operator-v0.1.0
type: python-run-function
name: Add Timestamp
id: transforms.add_timestamp
version: '1.0.0'
description: Adds a timestamp column to the input DataFrame.
config:
type: object
properties:
column_name:
type: string
title: Column Name
default: processed_at
x-ui:
widget: input
required:
- column_name

Step 4: Combine the logic and YAML

Add the run_function and ports fields to create the complete YAML file. Save it to your workspace, for example /Workspace/Users/<user-name>/udos/add_timestamp.yaml:

YAML
schema: user-defined-operator-v0.1.0
type: python-run-function
name: Add Timestamp
id: transforms.add_timestamp
version: '1.0.0'
description: Adds a timestamp column to the input DataFrame.
config:
type: object
properties:
column_name:
type: string
title: Column Name
default: processed_at
x-ui:
widget: input
required:
- column_name
ports:
input:
- name: in
title: Input
output:
- name: out
title: Output
run_function:
type: inline
code: |
from typing import Dict, Any

def run(config: Dict[str, Any], inputs: Dict[str, Any], spark) -> Dict[str, Any]:
from pyspark.sql import functions as F
df = inputs["in"]
result = df.withColumn(config["column_name"], F.current_timestamp())
return {"out": result}

Step 5: Register the operator

Add the file path to your .user_defined_operators.yaml file:

YAML
operators:
- /Workspace/Users/<user-name>/udos/add_timestamp.yaml

Step 6: Use the operator in Lakeflow Designer

Open Lakeflow Designer and verify the operator appears in the operator palette. Drag it onto the canvas, connect an input, configure the column name, and run a preview.

Create a complete UC user-defined operator

The following steps walk through creating a UC-based uc-udf operator.

Step 1: Define the logic

Write and test your function logic in a notebook:

Python
def double_value(input_value: float) -> float:
if input_value is None:
return None
return input_value * 2

Step 2: Create the YAML configuration

Define the operator metadata, configuration fields, and ports:

YAML
schema: user-defined-operator-v0.1.0
type: uc-udf
name: Double Value
id: math.double_value
version: '1.0.0'
description: Doubles the input value
config:
type: object
properties:
input_value:
type: string
title: Input Value
format: expression
x-ui:
widget: expression
port: input_data
required:
- input_value
ports:
input:
- name: input_data
title: Input
output:
- name: out
title: Output

Step 3: Combine the logic and YAML

Create the Unity Catalog function with the YAML embedded as a docstring:

SQL
CREATE OR REPLACE FUNCTION main.my_schema.double_value(input_value DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON
AS $$
"""
schema: user-defined-operator-v0.1.0
type: uc-udf
name: Double Value
id: math.double_value
version: "1.0.0"
description: Doubles the input value
config:
type: object
properties:
input_value:
type: string
title: Input Value
format: expression
x-ui:
widget: expression
port: input_data
required:
- input_value
ports:
input:
- name: input_data
title: Input
output:
- name: out
title: Output
"""

def double_value(input_value: float) -> float:
if input_value is None:
return None
return input_value * 2

return double_value(input_value)
$$

Step 4: Test the function

SQL
SELECT main.my_schema.double_value(5) AS result;
-- Should return: 10

Step 5: Register the operator

Add the Unity Catalog function reference to your .user_defined_operators.yaml file:

YAML
operators:
- catalog: main
schema: my_schema
functionName: double_value

Step 6: Use the operator in Lakeflow Designer

Open Lakeflow Designer and verify the operator appears in the operator palette. Drag it onto the canvas, connect an input, and run a preview.

Troubleshooting

Issue

Solution

Operator doesn't appear in Lakeflow Designer.

Check that .user_defined_operators.yaml exists and lists your function or file path. For python-run-function operators, verify the file path and that the YAML file is accessible.

Schema validation fails.

Verify your YAML against the official schema at https://your-workspace.cloud.databricks.com/static/schemas/user-defined-operator-v0.1.0.json.

Permission denied.

For UC-based operators, verify users have EXECUTE on the function and USE SCHEMA on the schema. For python-run-function operators, verify users have read access to the YAML file.

python-run-function operator fails at runtime.

Check that the run() function signature matches def run(config, inputs, spark). Verify that port names in the code match the YAML and that the return dictionary keys match output port name values.

UDTF returns wrong types.

UDTF return types must be explicit — you can't reference input column types.

Permissions

Permission

Purpose

Read access to .user_defined_operators.yaml.

Discover the operator.

Read access to the YAML file (python-run-function only).

Load the operator definition.

EXECUTE on the Unity Catalog function (UC-based operators only).

Run the operator.

USE SCHEMA on the schema (UC-based operators only).

Access the schema where the function is created.

Other permissions

Depending on your operator, users may require other permissions. For example, USE CONNECTION on a Unity Catalog connection for HTTP API calls.

Next steps

Explore the following tutorials:

Example

Type

Description

Gmail email sender

python-run-function

Send DataFrame data as a CSV email attachment via Gmail.

Compound interest calculator

uc-udf

Calculate future investment values using the compound interest formula.

K-means clustering

uc-udtf

Segment data into clusters using scikit-learn.

Send Slack message

uc-udf

Send notifications to Slack channels via API.

All UI widgets

uc-udf

Reference operator showcasing all available UI widgets.

For a complete reference to the YAML schema, see User-defined operator YAML reference.