User-defined operators in Lakeflow Designer

Preview

Lakeflow Designer lets you create user-defined operators that appear directly in the canvas alongside built-in operators. Use them to extend Lakeflow Designer with your own business logic, calculations, or integrations.

There are three types of user-defined operators:

python-run-function: A standalone YAML file with inline Python stored in the workspace. Best for DataFrame-level transformations and external integrations. Permissions are managed at the workspace file level.
uc-udf: Wraps a Unity Catalog scalar function. Best for column-level transformations. Access is governed by Unity Catalog permissions.
uc-udtf: Wraps a Unity Catalog table-valued function. Best for table-level transforms like ML clustering and aggregation. Access is governed by Unity Catalog permissions.

Feature	`python-run-function`	`uc-udf`	`uc-udtf`
Example use case	DataFrame transforms, API integrations, email notifications	Column-level calculations (BMI, interest rates)	ML clustering, aggregation across rows
Input	DataFrames	Single values	Entire table, row by row
Output	DataFrames	Single value	Table (multiple rows)
Requires Unity Catalog function	No	Yes	Yes
Access governance	Workspace file permissions	Unity Catalog permissions (`EXECUTE`, `USE SCHEMA`)	Unity Catalog permissions (`EXECUTE`, `USE SCHEMA`)
Supported languages	Python only	SQL or Python in a SQL wrapper	SQL or Python in a SQL wrapper

How do user-defined operators work?

A user-defined operator consists of:

Operator logic: The code that runs when the operator executes. This can be an inline Python run() function (for python-run-function) or a Unity Catalog function (for uc-udf and uc-udtf).
YAML configuration: Tells Lakeflow Designer how to present the operator in the UI, including the operator's name, description, input parameters, UI widgets, and ports. All operator types use the user-defined-operator-v0.1.0 schema.
Registration file: An entry in .user_defined_operators.yaml that lets Lakeflow Designer discover the operator.

Operator logic

Python run function user-defined operator logic

Every python-run-function operator must define a run() function:

Python
def run(config: Dict[str, Any], inputs: Dict[str, Any], spark) -> Dict[str, Any]:

config: User-configured values from the UI, keyed by property name.
inputs: Input DataFrames, keyed by input port name.
spark: The active SparkSession.
Returns: A dictionary mapping output port name values to DataFrames.

The following example filters rows from an input DataFrame:

Python
def run(config, inputs, spark):
    df = inputs["in"]
    filtered = df.filter(config["filter_expression"])
    return {"out": filtered}

If your operator requires external pip packages, add the environment field to the YAML:

YAML
environment:
  environment_version: '1'
  dependencies:
    - requests==2.31.0
    - beautifulsoup4==4.12.0

UDF and UDTF operator logic

You can write UC functions in SQL or Python. Python functions are wrapped in a SQL CREATE FUNCTION statement:

SQL function:

SQL
CREATE OR REPLACE FUNCTION my_catalog.my_schema.calculate_bmi(weight_kg DOUBLE, height_m DOUBLE)
RETURNS DOUBLE
LANGUAGE SQL
RETURN
  SELECT weight_kg / (height_m * height_m);

Python function (wrapped in SQL):

SQL
CREATE OR REPLACE FUNCTION my_catalog.my_schema.calculate_bmi(weight_kg DOUBLE, height_m DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON
AS $$
  return weight_kg / (height_m ** 2)
$$;

UDFs process a single value at a time and return a calculated value. UDTFs process tables row by row and can maintain state across all rows. Use uc-udf for column-level transforms and uc-udtf for operations like ML clustering or aggregation.

Additionally, UDTFs require you to define three key methods: __init__(), eval(), and terminate():

Python
class MyOperator:
    def __init__(self):
        # Called before processing - initialize any values needed.

    def eval(self, row, id_column, columns, k):
        # Called once per input row - accumulate data here.

    def terminate(self):
        # Called after all rows - perform final calculations and yield results.

note

UDTF return tables must have fixed, explicit types. You can't reference input column types in the return configuration.

YAML configuration

The YAML configuration tells Lakeflow Designer how to present the operator in the UI. It defines the operator's name, description, input parameters, UI widgets, and ports. Each configuration field is a property with a type, title, and optional x-ui widget hints:

YAML
config:
  type: object
  properties:
    my_param:
      type: string
      title: My Parameter
      x-ui:
        widget: input
    my_expression:
      type: string
      title: Column
      format: expression
      x-ui:
        widget: expression
        port: in
    my_number:
      type: number
      title: Count
      default: 10
      minimum: 0
      maximum: 100
  required:
    - my_param
    - my_expression

For full details on the YAML schema, including all widget types and configuration options, see User-defined operator YAML reference.

Ports

Ports define the inputs and outputs for your operator:

YAML
ports:
  input:
    - name: in
      title: Input Data
      mime: application/vnd.databricks.dataframe
      required: true
      allowMultiple: false
  output:
    - name: out
      title: Output Data

YAML for Python run function operators

For python-run-function operators, the YAML file is standalone and includes a run_function field with inline Python code:

YAML
schema: user-defined-operator-v0.1.0
type: python-run-function
name: Filter Rows
id: filter_rows
version: '1.0.0'
description: Filters rows based on a SQL expression.
config:
  type: object
  properties:
    filter_expression:
      type: string
      title: Filter Expression
      x-ui:
        widget: input
  required:
    - filter_expression
ports:
  input:
    - name: in
      title: Input
  output:
    - name: out
      title: Output
run_function:
  type: inline
  code: |
    def run(config, inputs, spark):
        df = inputs["in"]
        filtered = df.filter(config["filter_expression"])
        return {"out": filtered}

YAML for Unity Catalog functions

For UC-based operators, embed the YAML configuration as a comment or docstring in your function.

In SQL (use /* ... */ comment):

SQL
RETURN(/*
  schema: user-defined-operator-v0.1.0
  type: uc-udf
  name: Calculate BMI
  id: calculate_bmi
  version: "1.0.0"
  description: Calculates BMI from weight and height.
  config:
    type: object
    properties:
      weight_kg:
        type: string
        title: Weight (in kg)
        format: expression
        x-ui:
          widget: expression
          port: in
      height_m:
        type: string
        title: Height (in meters)
        format: expression
        x-ui:
          widget: expression
          port: in
    required:
      - weight_kg
      - height_m
  ports:
    input:
      - name: in
        title: Input Data
    output:
      - name: out
        title: Output
    */
  SELECT weight_kg / (height_m * height_m)
);

In Python (use """ ... """ docstring):

SQL
AS $$
  """
  schema: user-defined-operator-v0.1.0
  type: uc-udf
  name: Calculate BMI
  id: calculate_bmi
  version: "1.0.0"
  description: Calculates BMI from weight and height.
  config:
    type: object
    properties:
      weight_kg:
        type: string
        title: Weight (in kg)
        format: expression
        x-ui:
          widget: expression
          port: in
      height_m:
        type: string
        title: Height (in meters)
        format: expression
        x-ui:
          widget: expression
          port: in
    required:
      - weight_kg
      - height_m
  ports:
    input:
      - name: in
        title: Input Data
    output:
      - name: out
        title: Output
  """

  return weight_kg / (height_m ** 2)
$$;

Register and deploy your operator to Lakeflow Designer

For your operator to appear in Lakeflow Designer, register it in a .user_defined_operators.yaml file:

Workspace level: Place the file in the root of your workspace to make the operator visible to all users.
User level: Place the file in your user home folder (/Workspace/Users/<user-name>/.user_defined_operators.yaml) to make operators visible only to you.

The operators: section supports file paths, Unity Catalog function references, and glob patterns. You can mix entry types:

YAML
operators:
  # File path (python-run-function operators)
  - /Workspace/Users/me/udos/my_operator.yaml
  # Glob pattern (registers all matching files)
  - /Workspace/Users/me/udos/transforms/*.yaml
  # UC function reference (uc-udf and uc-udtf operators)
  - catalog: my_catalog
    schema: my_schema
    functionName: my_function

Advanced configurations

Preview mode

Lakeflow Designer supports previews while in design mode. For operators that call external APIs or write to external systems, add an is_preview config property so you can skip side effects during preview. When preview mode is enabled, users need to explicitly click Run to execute the operator with side effects.

YAML
config:
  type: object
  properties:
    is_preview:
      type: boolean
      format: is_preview
      default: false

Lakeflow Designer automatically sets this value to true during preview. Check it in your logic to skip side effects:

Python
# In a python-run-function
if config.get("is_preview"):
    return {"out": inputs["in"]}

# In a UC function (SQL)
CASE WHEN is_preview THEN 'preview' ELSE /* actual work */ END

Unity Catalog connections

For UC-based SQL operators that call external APIs, use Unity Catalog HTTP connections to securely store credentials:

SQL
CREATE CONNECTION my_api_connection TYPE HTTP OPTIONS (
  host 'https://api.example.com',
  port '443',
  base_path '/v1/',
  bearer_token 'your-token-here'
);

Then use the connection in your SQL UDF with the http_request() function. For details, see Connect to external HTTP services.

WorkspaceClient

For python-run-function operators, you can use the Databricks WorkspaceClient to access workspace resources and external APIs:

Python
def run(config, inputs, spark):
    from databricks.sdk import WorkspaceClient
    w = WorkspaceClient()
    # Use w to access workspace resources

Create a complete python-run-function user-defined operator

The following steps walk through creating a python-run-function operator from scratch.

Step 1: Define the logic

Write your run() function in a notebook:

Python
from typing import Dict, Any

def run(config: Dict[str, Any], inputs: Dict[str, Any], spark) -> Dict[str, Any]:
    from pyspark.sql import functions as F
    df = inputs["in"]
    result = df.withColumn(config["column_name"], F.current_timestamp())
    return {"out": result}

Step 2: Test the function

Test the function interactively with sample data:

Python
test_df = spark.createDataFrame(
    [("Alice", 100), ("Bob", 200)],
    ["name", "amount"]
)

result = run(
    config={"column_name": "processed_at"},
    inputs={"in": test_df},
    spark=spark
)

result["out"].show()

Step 3: Create the YAML configuration

Define the operator metadata, configuration fields, and ports in a YAML file:

YAML
schema: user-defined-operator-v0.1.0
type: python-run-function
name: Add Timestamp
id: transforms.add_timestamp
version: '1.0.0'
description: Adds a timestamp column to the input DataFrame.
config:
  type: object
  properties:
    column_name:
      type: string
      title: Column Name
      default: processed_at
      x-ui:
        widget: input
  required:
    - column_name

Step 4: Combine the logic and YAML

Add the run_function and ports fields to create the complete YAML file. Save it to your workspace, for example /Workspace/Users/<user-name>/udos/add_timestamp.yaml:

YAML
schema: user-defined-operator-v0.1.0
type: python-run-function
name: Add Timestamp
id: transforms.add_timestamp
version: '1.0.0'
description: Adds a timestamp column to the input DataFrame.
config:
  type: object
  properties:
    column_name:
      type: string
      title: Column Name
      default: processed_at
      x-ui:
        widget: input
  required:
    - column_name
ports:
  input:
    - name: in
      title: Input
  output:
    - name: out
      title: Output
run_function:
  type: inline
  code: |
    from typing import Dict, Any

    def run(config: Dict[str, Any], inputs: Dict[str, Any], spark) -> Dict[str, Any]:
        from pyspark.sql import functions as F
        df = inputs["in"]
        result = df.withColumn(config["column_name"], F.current_timestamp())
        return {"out": result}

Step 5: Register the operator

Add the file path to your .user_defined_operators.yaml file:

YAML
operators:
  - /Workspace/Users/<user-name>/udos/add_timestamp.yaml

Step 6: Use the operator in Lakeflow Designer

Open Lakeflow Designer and verify the operator appears in the operator palette. Drag it onto the canvas, connect an input, configure the column name, and run a preview.

Create a complete UC user-defined operator

The following steps walk through creating a UC-based uc-udf operator.

Step 1: Define the logic

Write and test your function logic in a notebook:

Python
def double_value(input_value: float) -> float:
    if input_value is None:
        return None
    return input_value * 2

Step 2: Create the YAML configuration

Define the operator metadata, configuration fields, and ports:

YAML
schema: user-defined-operator-v0.1.0
type: uc-udf
name: Double Value
id: math.double_value
version: '1.0.0'
description: Doubles the input value
config:
  type: object
  properties:
    input_value:
      type: string
      title: Input Value
      format: expression
      x-ui:
        widget: expression
        port: input_data
  required:
    - input_value
ports:
  input:
    - name: input_data
      title: Input
  output:
    - name: out
      title: Output

Step 3: Combine the logic and YAML

Create the Unity Catalog function with the YAML embedded as a docstring:

SQL
CREATE OR REPLACE FUNCTION main.my_schema.double_value(input_value DOUBLE)
RETURNS DOUBLE
LANGUAGE PYTHON
AS $$
  """
  schema: user-defined-operator-v0.1.0
  type: uc-udf
  name: Double Value
  id: math.double_value
  version: "1.0.0"
  description: Doubles the input value
  config:
    type: object
    properties:
      input_value:
        type: string
        title: Input Value
        format: expression
        x-ui:
          widget: expression
          port: input_data
    required:
      - input_value
  ports:
    input:
      - name: input_data
        title: Input
    output:
      - name: out
        title: Output
  """

  def double_value(input_value: float) -> float:
      if input_value is None:
          return None
      return input_value * 2

  return double_value(input_value)
$$

Step 4: Test the function

SQL
SELECT main.my_schema.double_value(5) AS result;
-- Should return: 10

Step 5: Register the operator

Add the Unity Catalog function reference to your .user_defined_operators.yaml file:

YAML
operators:
  - catalog: main
    schema: my_schema
    functionName: double_value

Step 6: Use the operator in Lakeflow Designer

Open Lakeflow Designer and verify the operator appears in the operator palette. Drag it onto the canvas, connect an input, and run a preview.

Troubleshooting

Issue	Solution
Operator doesn't appear in Lakeflow Designer.	Check that `.user_defined_operators.yaml` exists and lists your function or file path. For `python-run-function` operators, verify the file path and that the YAML file is accessible.
Schema validation fails.	Verify your YAML against the official schema at `https://your-workspace.cloud.databricks.com/static/schemas/user-defined-operator-v0.1.0.json`.
Permission denied.	For UC-based operators, verify users have `EXECUTE` on the function and `USE SCHEMA` on the schema. For `python-run-function` operators, verify users have read access to the YAML file.
`python-run-function` operator fails at runtime.	Check that the `run()` function signature matches `def run(config, inputs, spark)`. Verify that port names in the code match the YAML and that the return dictionary keys match output port `name` values.
UDTF returns wrong types.	UDTF return types must be explicit — you can't reference input column types.

Permissions

Permission	Purpose
Read access to `.user_defined_operators.yaml`.	Discover the operator.
Read access to the YAML file (`python-run-function` only).	Load the operator definition.
EXECUTE on the Unity Catalog function (UC-based operators only).	Run the operator.
USE SCHEMA on the schema (UC-based operators only).	Access the schema where the function is created.
Other permissions	Depending on your operator, users may require other permissions. For example, `USE CONNECTION` on a Unity Catalog connection for HTTP API calls.

Next steps

Explore the following tutorials:

Example	Type	Description
Gmail email sender	`python-run-function`	Send DataFrame data as a CSV email attachment via Gmail.
Compound interest calculator	`uc-udf`	Calculate future investment values using the compound interest formula.
K-means clustering	`uc-udtf`	Segment data into clusters using scikit-learn.
Send Slack message	`uc-udf`	Send notifications to Slack channels via API.
All UI widgets	`uc-udf`	Reference operator showcasing all available UI widgets.

For a complete reference to the YAML schema, see User-defined operator YAML reference.

How do user-defined operators work?​

Operator logic​

Python run function user-defined operator logic​

UDF and UDTF operator logic​

YAML configuration​

Ports​

YAML for Python run function operators​

YAML for Unity Catalog functions​

Register and deploy your operator to Lakeflow Designer​

Advanced configurations​

Preview mode​

Unity Catalog connections​

WorkspaceClient​

Create a complete python-run-function user-defined operator​

Step 1: Define the logic​

Step 2: Test the function​

Step 3: Create the YAML configuration​

Step 4: Combine the logic and YAML​

Step 5: Register the operator​

Step 6: Use the operator in Lakeflow Designer​

Create a complete UC user-defined operator​

Step 1: Define the logic​

Step 2: Create the YAML configuration​

Step 3: Combine the logic and YAML​

Step 4: Test the function​

Step 5: Register the operator​

Step 6: Use the operator in Lakeflow Designer​

Troubleshooting​

Permissions​

Next steps​

How do user-defined operators work?

Operator logic

Python run function user-defined operator logic

UDF and UDTF operator logic

YAML configuration

Ports

YAML for Python run function operators

YAML for Unity Catalog functions

Register and deploy your operator to Lakeflow Designer

Advanced configurations

Preview mode

Unity Catalog connections

WorkspaceClient

Create a complete python-run-function user-defined operator

Step 1: Define the logic

Step 2: Test the function

Step 3: Create the YAML configuration

Step 4: Combine the logic and YAML

Step 5: Register the operator

Step 6: Use the operator in Lakeflow Designer

Create a complete UC user-defined operator

Step 1: Define the logic

Step 2: Create the YAML configuration

Step 3: Combine the logic and YAML

Step 4: Test the function

Step 5: Register the operator

Step 6: Use the operator in Lakeflow Designer

Troubleshooting

Permissions

Next steps