Skip to main content

How to create a Visual data prep in Lakeflow Designer

Preview

This feature is in Public Preview.

Lakeflow Designer lets you build data transformation workflows on a visual, drag-and-drop canvas. This page explains how to create a Visual data prep — from adding a data source and chaining operators to previewing results and writing to Unity Catalog.

To build a Visual data prep:

  1. Verify requirements
  2. Create a Visual data prep
  3. Add a data source
  4. Add and configure operators
  5. Connect operators
  6. Preview results
  7. Write results to Unity Catalog
  8. Schedule or run in production

Requirements

To use Lakeflow Designer, you must have:

  • A Databricks workspace with Unity Catalog enabled.
  • CAN USE permission on at least one compute resource (either serverless or all-purpose).
  • Databricks AI Assistive features enabled. If a model is not available in your region, you might also need to enable cross-geo processing.

Create a new Visual data prep

To create a new Visual data prep, click Plus icon. New in the sidebar and select Visual data prep.

Designer opens with a welcome screen where you can add a data source or explore a sample Visual data prep.

Add a data source

Every Designer workflow starts with one or more data sources. The Source operator represents a data source on the canvas.

To add a data source:

  1. Add a Source operator. From the welcome screen, click Select source operator. From the canvas, open the operator menu and select Source.
  2. In the Source configuration pane, choose how to bring in your data. You can browse for an existing table, upload a local CSV or Excel file, create a table from a file, or import from Google Drive or SharePoint.
  3. Select or configure your data source. The Source operator appears on the canvas.

You can also drag and drop a CSV or Excel file directly onto the canvas to quickly create a Source operator.

To change the source later, open the Source operator and click Select a new data source. Changing the source invalidates the output cache for all downstream operators.

For the full details on each ingestion option, see Ingest data into Lakeflow Designer.

Add and configure operators

To add an operator, open the operator menu in the side panel on the left side of the canvas. Click an operator to add it to the canvas, or drag an operator from the menu onto the canvas. You can also click the + button next to any existing operator to add a new operator with an automatic connection.

LFD operator menu with drag and drop onto the canvas.

To configure an operator, double-click it, or hold the pointer over it and click Pencil icon. (Edit operator), to open the configuration pane. Set the options for that operator type, then click Apply.

For details on each available operator, see Built-in operators in Lakeflow Designer.

Connect operators

To connect two operators, click and drag from the output handle (the small circle on the right edge of an operator) to the input handle (the small circle on the left edge of the next operator). This specifies that data flows from the first operator into the second. Data flows from left to right through the Visual data prep.

LFD canvas showing a connection between two operators.

Some operators, such as Join and Combine, accept multiple inputs.

Use Genie Code

At any time while editing in Lakeflow Designer, you can create prompts to Genie Code to help.

LFD Genie Code prompt

When using Genie Code, the following buttons provide additional functionality:

  • Image icon.: Uploads an image to use as part of the prompt.
  • At icon.: Use to mention objects, such as tables or files, to use as part of the prompt.
  • Speech bubble plus icon.: Starts a new chat thread with new agent context.
  • Reader mode icon.: Opens the side panel for conversation history, and a more detailed view of what the agent is doing.

Preview results

Select any operator to see the results in the output pane at the bottom of the screen. For most operator types, the input data is on the left and output data is on the right.

LFD output pane below the canvas.

By default, operators run on a sample of the data of up to 1,000 rows. To run with the full dataset, click Sample dataset in the output pane and switch to Full dataset.

warning

Running with the full dataset reruns all upstream operators with the complete, unbounded dataset and can take a long time.

Data profiling

In the output pane, you can choose to show details of the data in your output. In the top right corner of the output pane, choose the Sidebar icon. sidebar button to open the selection details. Select a subset of your data to see details about your selection.

Sidebar showing graphs and detail about the selected output data.

Write results to Unity Catalog

Add an Output operator to write your results to a table in Unity Catalog:

  1. Open the operator menu and select Output, or click + next to your last operator and select Output.
  2. Connect the output handle of your last transformation to the Output operator's input handle if not already connected.
  3. Double-click the Output operator to open its configuration pane.
  4. Type a Table name and select the Output location (catalog and schema).
  5. Click Run.

Schedule or run in production

You can automate your workflows by scheduling them as jobs.

  • Schedule directly: Click the Schedule button in the top menu to create a scheduled job for your Visual data prep.
  • Add to a job: Create a Databricks job and choose your Designer Visual data prep as a task. This lets you combine that Visual data prep with other tasks in a larger pipeline.

LFD Schedule control for automating a Visual data prep as a job.

Additional tips when working in the canvas

The following actions are available on the canvas to help you edit your Visual data prep.

  • Rename an operator: Click the text field at the top of any configuration pane to rename the operator. Descriptive names make your Visual data prep easier to understand at a glance. Some operators, such as the SQL operator, can reference the output of other operators by name.
  • Copy an operator: Hold the pointer over an operator and click Copy icon., or select an operator and press Cmd/Ctrl+C then Cmd/Ctrl+V.
  • Auto-layout: Click DAG horizontal icon. in the bottom-left toolbar to automatically arrange all operators in a compact layout.
  • Fit view: Click Zoom to fit icon. in the bottom-left toolbar to see all operators in the current viewport.
  • Undo and redo: Press Cmd/Ctrl+Z and Cmd/Ctrl+Shift+Z, or use the buttons in the upper toolbar.

Next steps