Skip to main content

Built-in operators in Lakeflow Designer

Preview

This feature is in Public Preview.

Lakeflow Designer includes built-in operators for common data preparation and transformation tasks. Open the operator menu in the side panel on the left to browse operators by category, or use Search for an operator... at the top of the panel. To open an operator's configuration pane after you add it to the canvas, double-click it or hold the pointer over it and click Pencil icon. (Edit operator).

Source and output

Source

Imports data into Designer. The Source operator reads from a Unity Catalog table or other supported sources. It has two stages:

  1. Selecting a table or file: Search for a table or file by name, or browse by catalog and schema. You can also create a new table from this pane.
  2. Table summary: After selecting a table, the configuration pane shows the table's name, owner, and last updated time. Click Select a new data source to change the source. Changing the source invalidates the output cache for all downstream operators.

For the full range of data ingestion options, see Ingest data into Lakeflow Designer.

Output

Exports data out of Designer by writing results to a table in Unity Catalog.

In the Output configuration pane, specify:

  • Table name: The name of the table to create.
  • Output location: The catalog and schema where the table is created.

Click Run to run the Visual data prep and write results.

AI Function

Runs a built-in AI operation on your data. In the configuration pane, open Select a function and choose one of the functions below. Each function exposes options in the pane for inputs (for example, columns, prompts, labels, or languages) and outputs.

Function

Description

ai_analyze_sentiment

Performs sentiment analysis on input text.

ai_classify

Classifies text or parsed documents using labels you provide.

ai_extract

Extracts structured data from text or parsed documents using fields you define.

ai_fix_grammar

Corrects grammatical errors in text.

ai_gen

Answers a user-provided prompt against the input.

ai_mask

Masks specified entities in text (for example, for de-identification).

ai_similarity

Compares two strings and returns a semantic similarity score.

ai_summarize

Generates a summary of text.

ai_translate

Translates text into a target language you specify.

Transformations

The following operators perform transformations on your data.

Aggregate

Summarizes rows by grouping data and computing aggregate values.

  • Aggregate by: Select a column, choose an aggregation function, and provide a name for the output column. Click + Add aggregation to add more.
  • Group by: Select the columns to group by. Click + Add grouping to add more.

Supported aggregation functions: AVG, COUNT, MAX, MEAN, MEDIAN, MIN, PERCENTILE, STDDEV, SUM, VARIANCE.

note

Columns used in Group by are automatically included in the output.

Combine

Merges data from two tables with matching schemas into a single output.

  • Set operation: Choose Union, Intersect, or Except.
  • Merge strategy: Choose Distinct to exclude duplicate rows from the output, or All to keep all rows including duplicates.

Filter

Selects matching rows by keeping only rows that meet one or more conditions, using a graphical condition builder. For each condition, select a column, a condition type, and a value to conditionally match.

Supported condition types:

  • Is equal to / Is not equal to
  • Is one of / Is not one of
  • Contains / Does not contain
  • Starts with / Does not start with
  • Ends with / Does not end with
  • Greater than / Less than
  • Is null / Is not null

Join

Links two tables on a key by combining two input datasets based on matching column values.

To configure a Join:

  1. Select the two input tables to join.
  2. Specify at least one join condition by selecting matching columns from the two tables. Click + Add join expression to add more conditions.
  3. Select the join type: Full join, Inner join, Left join, or Right join.
  4. Optional: Choose which columns to include in the output. By default, all columns from both tables are included. Duplicate column names receive a table name prefix.
  5. Optional: Add custom expression columns based on the joined result.

Limit

Restricts row count by passing through only up to the maximum number of rows you specify.

Pivot

Reshapes tabular data in two directions. Use the tabs at the top of the configuration pane to choose the mode:

  • Rows → Columns (pivot): Turn distinct values in one column into new column headers, and fill those columns with aggregated values from another column.
  • Columns → Rows (unpivot): Fold one or more columns into rows; set names for the output key and value columns.

In Rows → Columns mode:

  • Pivot column: Choose the column whose distinct values become the new headers.
  • Value & aggregation: Choose the column whose values fill the pivoted cells, and select an aggregation function (for example, SUM, AVG, COUNT, MIN, or MAX). Configure how missing values are handled (for example, null or zero), if available in the pane.

In Columns → Rows mode, select the columns to unpivot and configure the output key and value column names.

Include columns: Use the table to choose which columns stay in the output alongside the pivoted or unpivoted values (and to drop columns you do not need before the transformation). Designer infers fixed (grouping) columns from the columns you do not assign to pivot, value, or unpivot roles.

Sort

Orders rows on one or more columns. For each column, choose ASC (ascending) or DESC (descending). Click + Add sort expression to sort by additional columns. Sorting follows standard lexical order.

SQL

Writes custom SQL code for any transformation not covered by the other operators.

Type a SQL SELECT statement in the editor. To reference the output of an input operator, use that operator's name as the table name in your query. For example:

SQL
SELECT COUNT(*)
FROM aggregate_2
WHERE 1 = 1

Click the Code icon. button in the editor to open the full SQL code pane and see how your statement fits into the complete workflow.

Transform

Selects, creates, or transforms columns from the input data.

In the Transform configuration pane:

  • Include or exclude columns: Use the checkboxes to choose which columns pass through to the output. Click the header checkbox to select all columns or clear the selection.
  • Rename a column: Type a new name in the Rename field next to any column.
  • Reorder columns: Drag the handle on the left side of a row to change the column order.
  • Add a custom column: Click + Add a custom column to open the expression editor. See below.

Custom columns

The expression editor lets you define new columns using either natural language or code. The editor has two input boxes and is bidirectional:

  • Description: Type a natural language description of what you want the column to do. Designer uses Genie to generate the corresponding code expression below.
  • Expression: If you prefer to write or edit code directly, click the edit expression button. Editing the expression automatically generates a natural language description.

To remove a custom column, hold the pointer over its row and click Dash icon..

Python

Runs custom Python (PySpark) on the input data. Your code receives upstream datasets as Spark DataFrames and must assign a single DataFrame to result, which becomes this operator's output. Use the configuration pane to wire inputs and review any options the editor provides.

inputs["data"] is a list of input DataFrames, in upstream order. The operator details pane shows the names of each input, in order. For example, Available inputs: inputs["data"][0] (customers), inputs["Data"][1] (sales).

A minimal pattern is to use the first input when present, or an empty DataFrame otherwise:

Python
# inputs["data"] is a list of input DataFrames

result = inputs["data"][0] if inputs["data"] else spark.createDataFrame([], "col: string")

From there, you can chain DataFrame operations (for example, select, filter, withColumn, or joins) on result before the assignment ends, or replace result with a new DataFrame built from inputs["data"].

Organization

Note

Adds a note on the canvas so you can document the workflow itself: its purpose, assumptions, caveats, or handoff context for anyone who opens the Visual data prep later. Note content supports Markdown, so you can use headings, lists, links, and emphasis where plain text is not enough. Notes do not affect how data flows through operators.

Group

Visually groups operators on the canvas without changing how data flows between them—helpful when a Visual data prep grows large or you want to reflect logical stages.

To build a group:

  • Drag operators into a group: Drag one or more operators onto a group to add them to it.
  • Create a group from a selection: Select one or multiple operators, open the context menu (right-click), and choose Create new group to wrap the selection in a new group.

After operators are in a group, you can give the group a descriptive name and minimize or expand it to show or hide its contents on the canvas.

Next steps