User-defined operators in Lakeflow Designer
This feature is in Public Preview.
Lakeflow Designer lets you create user-defined operators that appear directly in the canvas alongside built-in operators. Use them to extend Lakeflow Designer with your own business logic, calculations, or integrations.
There are three types of user-defined operators:
python-run-function: A standalone YAML file with inline Python stored in the workspace. Best for DataFrame-level transformations and external integrations. Permissions are managed at the workspace file level.uc-udf: Wraps a Unity Catalog scalar function. Best for column-level transformations. Access is governed by Unity Catalog permissions.uc-udtf: Wraps a Unity Catalog table-valued function. Best for table-level transforms like ML clustering and aggregation. Access is governed by Unity Catalog permissions.
Feature |
|
|
|
|---|---|---|---|
Example use case | DataFrame transforms, API integrations, email notifications | Column-level calculations (BMI, interest rates) | ML clustering, aggregation across rows |
Input | DataFrames | Single values | Entire table, row by row |
Output | DataFrames | Single value | Table (multiple rows) |
Requires Unity Catalog function | No | Yes | Yes |
Access governance | Workspace file permissions | Unity Catalog permissions ( | Unity Catalog permissions ( |
Supported languages | Python only | SQL or Python in a SQL wrapper | SQL or Python in a SQL wrapper |
How do user-defined operators work?
A user-defined operator consists of:
- Operator logic: The code that runs when the operator executes. This can be an inline Python
run()function (forpython-run-function) or a Unity Catalog function (foruc-udfanduc-udtf). - YAML configuration: Tells Lakeflow Designer how to present the operator in the UI, including the operator's name, description, input parameters, UI widgets, and ports. All operator types use the
user-defined-operator-v0.1.0schema. - Registration file: An entry in
.user_defined_operators.yamlthat lets Lakeflow Designer discover the operator.
Operator logic
Python run function user-defined operator logic
Every python-run-function operator must define a run() function:
def run(config: Dict[str, Any], inputs: Dict[str, Any], spark) -> Dict[str, Any]:
config: User-configured values from the UI, keyed by property name.inputs: Input DataFrames, keyed by input portname.spark: The active SparkSession.- Returns: A dictionary mapping output port
namevalues to DataFrames.
The following example filters rows from an input DataFrame:
def run(config, inputs, spark):
df = inputs["in"]
filtered = df.filter(config["filter_expression"])
return {"out": filtered}
If your operator requires external pip packages, add the environment field to the YAML:
environment:
environment_version: '1'
dependencies:
- requests==2.31.0
- beautifulsoup4==4.12.0
UDF and UDTF operator logic
You can write UC functions in SQL or Python. Python functions are wrapped in a SQL CREATE FUNCTION statement:
SQL function:
CREATE OR REPLACE FUNCTION my_catalog.my_schema.calculate_bmi(weight_kg DOUBLE, height_m DOUBLE)
RETURNS DOUBLE
LANGUAGE SQL
RETURN
SELECT weight_kg / (height_m * height_m);
Python function (wrapped in SQL):
CREATE OR REPLACE FUNCTION my_catalog.my_schema.calculate_bmi(weight_kg DOUBLE, height_m DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON
AS $$
return weight_kg / (height_m ** 2)
$$;
UDFs process a single value at a time and return a calculated value. UDTFs process tables row by row and can maintain state across all rows. Use uc-udf for column-level transforms and uc-udtf for operations like ML clustering or aggregation.
Additionally, UDTFs require you to define three key methods: __init__(), eval(), and terminate():
class MyOperator:
def __init__(self):
# Called before processing - initialize any values needed.
def eval(self, row, id_column, columns, k):
# Called once per input row - accumulate data here.
def terminate(self):
# Called after all rows - perform final calculations and yield results.
UDTF return tables must have fixed, explicit types. You can't reference input column types in the return configuration.
YAML configuration
The YAML configuration tells Lakeflow Designer how to present the operator in the UI. It defines the operator's name, description, input parameters, UI widgets, and ports. Each configuration field is a property with a type, title, and optional x-ui widget hints:
config:
type: object
properties:
my_param:
type: string
title: My Parameter
x-ui:
widget: input
my_expression:
type: string
title: Column
format: expression
x-ui:
widget: expression
port: in
my_number:
type: number
title: Count
default: 10
minimum: 0
maximum: 100
required:
- my_param
- my_expression
For full details on the YAML schema, including all widget types and configuration options, see User-defined operator YAML reference.
Ports
Ports define the inputs and outputs for your operator:
ports:
input:
- name: in
title: Input Data
mime: application/vnd.databricks.dataframe
required: true
allowMultiple: false
output:
- name: out
title: Output Data
YAML for Python run function operators
For python-run-function operators, the YAML file is standalone and includes a run_function field with inline Python code:
schema: user-defined-operator-v0.1.0
type: python-run-function
name: Filter Rows
id: filter_rows
version: '1.0.0'
description: Filters rows based on a SQL expression.
config:
type: object
properties:
filter_expression:
type: string
title: Filter Expression
x-ui:
widget: input
required:
- filter_expression
ports:
input:
- name: in
title: Input
output:
- name: out
title: Output
run_function:
type: inline
code: |
def run(config, inputs, spark):
df = inputs["in"]
filtered = df.filter(config["filter_expression"])
return {"out": filtered}
YAML for Unity Catalog functions
For UC-based operators, embed the YAML configuration as a comment or docstring in your function.
In SQL (use /* ... */ comment):
RETURN(/*
schema: user-defined-operator-v0.1.0
type: uc-udf
name: Calculate BMI
id: calculate_bmi
version: "1.0.0"
description: Calculates BMI from weight and height.
config:
type: object
properties:
weight_kg:
type: string
title: Weight (in kg)
format: expression
x-ui:
widget: expression
port: in
height_m:
type: string
title: Height (in meters)
format: expression
x-ui:
widget: expression
port: in
required:
- weight_kg
- height_m
ports:
input:
- name: in
title: Input Data
output:
- name: out
title: Output
*/
SELECT weight_kg / (height_m * height_m)
);
In Python (use """ ... """ docstring):
AS $$
"""
schema: user-defined-operator-v0.1.0
type: uc-udf
name: Calculate BMI
id: calculate_bmi
version: "1.0.0"
description: Calculates BMI from weight and height.
config:
type: object
properties:
weight_kg:
type: string
title: Weight (in kg)
format: expression
x-ui:
widget: expression
port: in
height_m:
type: string
title: Height (in meters)
format: expression
x-ui:
widget: expression
port: in
required:
- weight_kg
- height_m
ports:
input:
- name: in
title: Input Data
output:
- name: out
title: Output
"""
return weight_kg / (height_m ** 2)
$$;
Register and deploy your operator to Lakeflow Designer
For your operator to appear in Lakeflow Designer, register it in a .user_defined_operators.yaml file:
- Workspace level: Place the file in the root of your workspace to make the operator visible to all users.
- User level: Place the file in your user home folder (
/Workspace/Users/<user-name>/.user_defined_operators.yaml) to make operators visible only to you.
The operators: section supports file paths, Unity Catalog function references, and glob patterns. You can mix entry types:
operators:
# File path (python-run-function operators)
- /Workspace/Users/me/udos/my_operator.yaml
# Glob pattern (registers all matching files)
- /Workspace/Users/me/udos/transforms/*.yaml
# UC function reference (uc-udf and uc-udtf operators)
- catalog: my_catalog
schema: my_schema
functionName: my_function
Advanced configurations
Preview mode
Lakeflow Designer supports previews while in design mode. For operators that call external APIs or write to external systems, add an is_preview config property so you can skip side effects during preview. When preview mode is enabled, users need to explicitly click Run to execute the operator with side effects.
config:
type: object
properties:
is_preview:
type: boolean
format: is_preview
default: false
Lakeflow Designer automatically sets this value to true during preview. Check it in your logic to skip side effects:
# In a python-run-function
if config.get("is_preview"):
return {"out": inputs["in"]}
# In a UC function (SQL)
CASE WHEN is_preview THEN 'preview' ELSE /* actual work */ END
Unity Catalog connections
For UC-based SQL operators that call external APIs, use Unity Catalog HTTP connections to securely store credentials:
CREATE CONNECTION my_api_connection TYPE HTTP OPTIONS (
host 'https://api.example.com',
port '443',
base_path '/v1/',
bearer_token 'your-token-here'
);
Then use the connection in your SQL UDF with the http_request() function. For details, see Connect to external HTTP services.
WorkspaceClient
For python-run-function operators, you can use the Databricks WorkspaceClient to access workspace resources and external APIs:
def run(config, inputs, spark):
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
# Use w to access workspace resources
Create a complete python-run-function user-defined operator
The following steps walk through creating a python-run-function operator from scratch.
Step 1: Define the logic
Write your run() function in a notebook:
from typing import Dict, Any
def run(config: Dict[str, Any], inputs: Dict[str, Any], spark) -> Dict[str, Any]:
from pyspark.sql import functions as F
df = inputs["in"]
result = df.withColumn(config["column_name"], F.current_timestamp())
return {"out": result}
Step 2: Test the function
Test the function interactively with sample data:
test_df = spark.createDataFrame(
[("Alice", 100), ("Bob", 200)],
["name", "amount"]
)
result = run(
config={"column_name": "processed_at"},
inputs={"in": test_df},
spark=spark
)
result["out"].show()
Step 3: Create the YAML configuration
Define the operator metadata, configuration fields, and ports in a YAML file:
schema: user-defined-operator-v0.1.0
type: python-run-function
name: Add Timestamp
id: transforms.add_timestamp
version: '1.0.0'
description: Adds a timestamp column to the input DataFrame.
config:
type: object
properties:
column_name:
type: string
title: Column Name
default: processed_at
x-ui:
widget: input
required:
- column_name
Step 4: Combine the logic and YAML
Add the run_function and ports fields to create the complete YAML file. Save it to your workspace, for example /Workspace/Users/<user-name>/udos/add_timestamp.yaml:
schema: user-defined-operator-v0.1.0
type: python-run-function
name: Add Timestamp
id: transforms.add_timestamp
version: '1.0.0'
description: Adds a timestamp column to the input DataFrame.
config:
type: object
properties:
column_name:
type: string
title: Column Name
default: processed_at
x-ui:
widget: input
required:
- column_name
ports:
input:
- name: in
title: Input
output:
- name: out
title: Output
run_function:
type: inline
code: |
from typing import Dict, Any
def run(config: Dict[str, Any], inputs: Dict[str, Any], spark) -> Dict[str, Any]:
from pyspark.sql import functions as F
df = inputs["in"]
result = df.withColumn(config["column_name"], F.current_timestamp())
return {"out": result}
Step 5: Register the operator
Add the file path to your .user_defined_operators.yaml file:
operators:
- /Workspace/Users/<user-name>/udos/add_timestamp.yaml
Step 6: Use the operator in Lakeflow Designer
Open Lakeflow Designer and verify the operator appears in the operator palette. Drag it onto the canvas, connect an input, configure the column name, and run a preview.
Create a complete UC user-defined operator
The following steps walk through creating a UC-based uc-udf operator.
Step 1: Define the logic
Write and test your function logic in a notebook:
def double_value(input_value: float) -> float:
if input_value is None:
return None
return input_value * 2
Step 2: Create the YAML configuration
Define the operator metadata, configuration fields, and ports:
schema: user-defined-operator-v0.1.0
type: uc-udf
name: Double Value
id: math.double_value
version: '1.0.0'
description: Doubles the input value
config:
type: object
properties:
input_value:
type: string
title: Input Value
format: expression
x-ui:
widget: expression
port: input_data
required:
- input_value
ports:
input:
- name: input_data
title: Input
output:
- name: out
title: Output
Step 3: Combine the logic and YAML
Create the Unity Catalog function with the YAML embedded as a docstring:
CREATE OR REPLACE FUNCTION main.my_schema.double_value(input_value DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON
AS $$
"""
schema: user-defined-operator-v0.1.0
type: uc-udf
name: Double Value
id: math.double_value
version: "1.0.0"
description: Doubles the input value
config:
type: object
properties:
input_value:
type: string
title: Input Value
format: expression
x-ui:
widget: expression
port: input_data
required:
- input_value
ports:
input:
- name: input_data
title: Input
output:
- name: out
title: Output
"""
def double_value(input_value: float) -> float:
if input_value is None:
return None
return input_value * 2
return double_value(input_value)
$$
Step 4: Test the function
SELECT main.my_schema.double_value(5) AS result;
-- Should return: 10
Step 5: Register the operator
Add the Unity Catalog function reference to your .user_defined_operators.yaml file:
operators:
- catalog: main
schema: my_schema
functionName: double_value
Step 6: Use the operator in Lakeflow Designer
Open Lakeflow Designer and verify the operator appears in the operator palette. Drag it onto the canvas, connect an input, and run a preview.
Troubleshooting
Issue | Solution |
|---|---|
Operator doesn't appear in Lakeflow Designer. | Check that |
Schema validation fails. | Verify your YAML against the official schema at |
Permission denied. | For UC-based operators, verify users have |
| Check that the |
UDTF returns wrong types. | UDTF return types must be explicit — you can't reference input column types. |
Permissions
Permission | Purpose |
|---|---|
Read access to | Discover the operator. |
Read access to the YAML file ( | Load the operator definition. |
EXECUTE on the Unity Catalog function (UC-based operators only). | Run the operator. |
USE SCHEMA on the schema (UC-based operators only). | Access the schema where the function is created. |
Other permissions | Depending on your operator, users may require other permissions. For example, |
Next steps
Explore the following tutorials:
Example | Type | Description |
|---|---|---|
| Send DataFrame data as a CSV email attachment via Gmail. | |
| Calculate future investment values using the compound interest formula. | |
| Segment data into clusters using scikit-learn. | |
| Send notifications to Slack channels via API. | |
| Reference operator showcasing all available UI widgets. |
For a complete reference to the YAML schema, see User-defined operator YAML reference.