Use Genie Code for pipeline development

Preview

This page introduces Genie Code for pipeline development, an AI data agent available by selecting Agent mode in Genie Code. Designed specifically for Lakeflow Spark Declarative Pipelines (SDP) and the Lakeflow Pipelines Editor, it explores data, generates and runs pipeline code, and fixes errors, all from a single prompt.

What is Genie Code for pipeline development?

Genie Code in Agent mode is an autonomous partner that can automate entire multi-step data engineering workflows in SDP and the Lakeflow Pipelines Editor.

Use the Data Engineering Agent.

Compared to Genie Code chat mode, Agent mode has expanded capabilities: planning a solution, retrieving relevant assets, running code, using pipeline outputs to improve results, fixing errors automatically, and more.

Genie Code in Agent mode can plan and generate entire pipelines end-to-end from scratch, or accelerate working on an existing pipeline. The agent works with you to approve its plans and confirm its next steps before proceeding. With your approval, Genie Code can use tools to perform tasks like searching tables, editing a SQL or Python source file, running pipeline updates, and reading pipeline datasets.

Genie Code's access and actions are governed by the user’s permissions. It can only access data that you have access to and perform operations that you have permissions for.

note

When you turn on Agent mode in Genie Code, Genie Code adapts its capabilities based on the features you are currently using in Databricks. For example, in the Lakeflow Pipelines Editor, Genie Code focuses on pipeline editing and data engineering tasks. In notebooks and the SQL Editor, Genie Code supports data exploration and analysis. See Use Genie Code for data science for more information.

Requirements

To use Genie Code for data engineering, your workspace needs the following:

Partner-powered AI features enabled for both the account and workspace. See Partner-powered AI features.
Genie Code Agent mode preview enabled. See Manage Databricks previews.

Use Genie Code for pipeline development

To use Genie Code's agentic capabilities for pipeline development:

From Lakeflow Pipelines Editor, open Genie Code side panel by clicking Genie Code in the upper-right corner of your workspace.
In the lower-right corner, select Agent. This toggles on Genie Code’s Agent mode, allowing you to Genie Code's agentic data engineering capabilities.
Enter a prompt for Genie Code. For example, you can ask it questions about your pipeline, such as “describe this pipeline”. You can also ask it to add new datasets, for example, "create silver_sales_data in a new file that reads from bronze_sales_data and cleans the data and adds useful quality expectations."

note
Genie Code respects the user’s Unity Catalog permissions, so it can only access the data and pipeline source that you have access to.
As Genie Code generates its response, it often pauses to get your input:
- For more complex tasks, Genie Code may create a step-by-step plan and ask clarifying questions. Answer its clarifying questions to help it hone its plan.
- When Genie Code needs to run code or update a pipeline, it asks for your approval before proceeding. Allow or Decline its request. You can also select Allow in this thread (referring to Genie Code conversation thread) or Always allow.
  
  important
  Genie Code in Agent mode can generate and execute code in your pipeline. While it has guardrails to prevent dangerous actions, there is still risk. You should only use it with data you trust, and you should review code before running it.
- As Genie Code continues its work, you may be prompted to select Continue or Reject. Review its existing work, then select Continue to allow it to continue to its next steps or Reject to tell it to try something else.
- To stop Genie Code while it is working, click the red .

Genie Code can create new files, generate text, queries, and code, run the files or pipelines, and access the output datasets to interpret the results.

note

In order for Genie Code to continue its work and take next steps, you need to stay on the current tab that it is working in.

tip

You can add instructions for the Genie Code to use in most responses. For example, if you have code conventions you want to use, or preferred libraries to use, you can add these guidelines to instructions for Genie Code. You can also create skills to extend Genie Code with specialized capabilities for your domain-specific tasks. For more details and other tips, see Tips to improve Genie Code responses.

Capabilities

In Agent mode, Genie Code can help with most pipeline development tasks. Key capabilities include:

Data discovery: Genie Code can search tables in the workspace to help you find the required data for a task.
Pipeline code edits: Genie Code can create and edit multiple files at a time. It keeps you informed about which files it is changing and shows you the code diff in each file, so you can review the changes individually or all together at the end.
Pipeline execution: Genie Code can run individual files, dry-run/run the pipeline, or do a full refresh. When Genie Code wants to proceed, it asks for your confirmation before doing so.
Understanding and improving pipeline behavior: Genie Code can inspect datasets and pipeline outputs to help you understand what a pipeline is doing end-to-end and why. For example, it can summarize transformations, trace how data flows into downstream tables, and highlight unexpected changes in row counts or schemas. When it surfaces potential data quality issues, Genie Code can help you reason about their cause and suggest where and how to address them in the pipeline.

These capabilities support common use cases such as:

Authoring a new pipeline: Genie Code can help with all steps of creating a new medallion architecture pipeline, from ingesting data, to standardizing and cleaning the data, to transforming and analyzing the data.
Explain a pipeline: Genie Code can analyze and explain an existing pipeline to help you ramp up quickly.
Fix issues: When you have errors, Genie Code can help diagnose and fix the problems, iterating through multiple files until the issue is resolved.

Examples

Try the following prompts to get started:

"Build and run a medallion architecture pipeline for fraud detection using the table transactions and customers in my_catalog.my_schema."
"Explain every step of this pipeline."
"Fix the failure in this pipeline."

Next steps

Learn more about Databricks AI assistive features
Get tips to Tips to improve Genie Code responses
Use Genie Code for data science, for data discovery and exploration
Use Genie Code for dashboard authoring
Explore the Lakeflow Pipelines Editor

What is Genie Code for pipeline development?​

Requirements​

Use Genie Code for pipeline development​

Capabilities​

Examples​

Next steps​

What is Genie Code for pipeline development?

Requirements

Use Genie Code for pipeline development

Capabilities

Examples

Next steps