Built-in operators in Lakeflow Designer

Lakeflow Designer includes built-in operators for common data preparation and transformation tasks. Open the operator menu in the side pane on the left to browse operators by category, or use Search for an operator... at the top of the pane. Operator search suggests operators based on your intent. For example, typing average by month returns the Aggregate operator. To open an operator's configuration pane after you add it to the canvas, double-click it or hold the pointer over it and click (Edit operator).

Each operator displays an AI-generated description of what it does. The description also acts as an editor for the operator: Editing the description reconfigures the operator to match your description.

To learn how to create your own user-defined operators, see User-defined operators in Lakeflow Designer.

Source and output

Source

Imports data into Designer. The Source operator reads from a Unity Catalog table or other supported sources. It has two stages:

Selecting a table or file: Search for a table or file by name, or browse by catalog and schema. You can also create a new table from this pane.
Table summary: After selecting a table, the configuration pane shows the table's name, owner, and last updated time. Click Select a new data source to change the source. Changing the source invalidates the output cache for all downstream operators.

For the full range of data ingestion options, see Ingest data into Lakeflow Designer.

Output

Exports data out of Designer by writing results to a table in Unity Catalog.

In the Output configuration pane, specify:

Table name: The name of the table to create.
Output location: The catalog and schema where the table is created.

Click Run to run the visual data prep and write results.

Each run writes a managed Unity Catalog table. If the table doesn't exist, the operator creates it; if it does, the run replaces its contents. Scheduled runs replace the table the same way, so the output reflects the latest run rather than appending rows.

AI Function

Runs a built-in AI operation on your data. In the configuration pane, open Select a function and choose one of the functions below. Each function exposes options in the pane for inputs (for example, columns, prompts, labels, or languages) and outputs.

For details about each function, see Enrich data using AI Functions.

Function	Description
`ai_analyze_sentiment`	Performs sentiment analysis on input text.
`ai_classify`	Classifies text or parsed documents using labels you provide.
`ai_extract`	Extracts structured data from text or parsed documents using fields you define.
`ai_fix_grammar`	Corrects grammatical errors in text.
`ai_gen`	Answers a user-provided prompt against the input.
`ai_mask`	Masks specified entities in text (for example, for de-identification).
`ai_similarity`	Compares two strings and returns a semantic similarity score.
`ai_summarize`	Generates a summary of text.
`ai_translate`	Translates text into a target language you specify.

Transformations

The following operators perform transformations on your data.

Aggregate

Summarizes rows by grouping data and computing aggregate values.

Aggregate by: Select a column, choose an aggregation function, and provide a name for the output column. Click + Add aggregation to add more.
Group by: Select the columns to group by. Click + Add grouping to add more.

Supported aggregation functions: AVG, COUNT, MAX, MEAN, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE.

note

Columns used in Group by are automatically included in the output.

Combine

Merges data from multiple tables with matching schemas into a single output.

Set operation: Choose Union, Intersect, or Except.
Merge strategy: Choose Distinct to exclude duplicate rows from the output, or All to keep all rows including duplicates.

Filter

Selects matching rows by keeping only rows that meet one or more conditions, using a graphical condition builder. For each condition, select a column, a condition type, and a value to conditionally match.

Supported condition types:

Is equal to / Is not equal to
Is one of / Is not one of
Contains / Does not contain
Starts with / Does not start with
Ends with / Does not end with
Greater than / Less than
Is null / Is not null

Join

Links two tables on a key by combining two input datasets based on matching column values.

To configure a Join:

Select the two input tables to join.
Specify at least one join condition by selecting matching columns from the two tables. Click + Add join expression to add more conditions.
Optional: Add a custom join condition by clicking the arrow between the left and right tables. Edit the AI-generated description or write SQL to configure the condition.
Select the join type: Full join, Inner join, Left join, or Right join.
Optional: Choose which columns to include in the output. By default, all columns from both tables are included. Duplicate column names receive a table name prefix.
Optional: Add custom expression columns based on the joined result.

Limit

Restricts row count by passing through only up to the maximum number of rows you specify.

Pivot

Reshapes tabular data in two directions. Use the tabs at the top of the configuration pane to choose the mode:

Rows → Columns (pivot): Turn distinct values in one column into new column headers, and fill those columns with aggregated values from another column.
Columns → Rows (unpivot): Fold one or more columns into rows; set names for the output key and value columns.

In Rows → Columns mode:

Pivot column: Choose the column whose distinct values become the new headers.
Value & aggregation: Choose the column whose values fill the pivoted cells, and select an aggregation function (for example, SUM, AVG, COUNT, MIN, or MAX). Configure how missing values are handled (for example, null or zero), if available in the pane.

In Columns → Rows mode, select the columns to unpivot and configure the output key and value column names.

Include columns: Use the table to choose which columns stay in the output alongside the pivoted or unpivoted values (and to drop columns you do not need before the transformation). Designer infers fixed (grouping) columns from the columns you do not assign to pivot, value, or unpivot roles.

Sort

Orders rows on one or more columns. For each column, choose ASC (ascending) or DESC (descending). Click + Add sort expression to sort by additional columns. Sorting follows standard lexical order.

SQL

Writes custom SQL code for any transformation not covered by the other operators.

Type a SQL SELECT statement in the editor. To reference the output of an input operator, use that operator's name as the table name in your query. For example:

SQL
SELECT COUNT(*)
FROM aggregate_2
WHERE 1 = 1

Click the button in the editor to open the full SQL code pane and see how your statement fits into the complete workflow.

To reference a visual data prep parameter, use named parameter marker syntax, such as :environment. When you open the operator for editing, Designer shows an example above the editor. See Parameters.

Transform

Selects, creates, or transforms columns from the input data.

In the Transform configuration pane:

Include or exclude columns: Use the checkboxes to choose which columns pass through to the output. Click the header checkbox to select all columns or clear the selection.
Rename a column: Type a new name in the Rename field next to any column.
Reorder columns: Drag the handle on the left side of a row to change the column order.
Add a custom column: Click + Add a custom column to open the expression editor. See below.

Custom columns

The expression editor lets you define new columns using either natural language or code. The editor has two input boxes and is bidirectional:

Description: Type a natural language description of what you want the column to do. Designer uses Genie to generate the corresponding code expression below.
Expression: If you prefer to write or edit code directly, click the edit expression button. Editing the expression automatically generates a natural language description.

To remove a custom column, hold the pointer over its row and click .

Python

Runs custom Python (PySpark) on the input data. Your code receives upstream datasets as Spark DataFrames and must assign a single DataFrame to result, which becomes this operator's output. Use the configuration pane to wire inputs and review any options the editor provides.

inputs["data"] is a list of input DataFrames, in upstream order. The operator details pane shows the names of each input, in order. For example, Available inputs: inputs["data"][0] (customers), inputs["Data"][1] (sales).

A minimal pattern is to use the first input when present, or an empty DataFrame otherwise:

Python
# inputs["data"] is a list of input DataFrames

result = inputs["data"][0] if inputs["data"] else spark.createDataFrame([], "col: string")

From there, you can chain DataFrame operations (for example, select, filter, withColumn, or joins) on result before the assignment ends, or replace result with a new DataFrame built from inputs["data"].

To reference a visual data prep parameter, call dbutils.widgets.get(), such as dbutils.widgets.get("environment"). When you open the operator for editing, Designer shows an example above the editor. See Parameters.

Organization

Note

Adds a note on the canvas so you can document the workflow itself: its purpose, assumptions, caveats, or handoff context for anyone who opens the visual data prep later. Note content supports Markdown, so you can use headings, lists, links, and emphasis where plain text is not enough. Notes do not affect how data flows through operators.

Group

Visually groups operators on the canvas without changing how data flows between them, which is helpful when a visual data prep grows large or you want to reflect logical stages.

To build a group:

Drag operators into a group: Drag one or more operators onto a group to add them to it.
Create a group from a selection: Select one or multiple operators, open the context menu (right-click), and choose Create new group to wrap the selection in a new group.

After operators are in a group, you can give the group a descriptive name and minimize or expand it to show or hide its contents on the canvas.

Source and output​

Source​

Output​

AI Function​

Transformations​

Aggregate​

Combine​

Filter​

Join​

Limit​

Pivot​

Sort​

SQL​

Transform​

Custom columns​

Python​

Organization​

Note​

Group​

Additional resources​