Use the Data Engineering Agent
This feature is in Public Preview.
This page introduces the Data Engineering Agent which adds capabilities to the Databricks Assistant. To use the Data Engineering Agent, select Agent mode in the Assistant.
The Data Engineering agent is designed specifically for Lakeflow Spark Declarative Pipelines (SDP) and the Lakeflow Pipelines Editor, it explores data, generates and runs pipeline code, and fixes errors, all from a single prompt.
What is the Data Engineering Agent?
The Data Engineering Agent is a powerful capability in the Databricks Assistant Agent Mode that transforms the Assistant into an autonomous partner that can automate entire multi-step data engineering workflows in SDP and the Lakeflow Pipelines Editor.

Compared to the Assistant chat mode, agent mode has expanded capabilities: planning a solution, retrieving relevant assets, running code, using pipeline outputs to improve results, fixing errors automatically, and more.
The Data Engineering Agent can plan and generate entire pipelines end-to-end from scratch, or accelerate working on an existing pipeline. The agent works with you to approve its plans and confirm its next steps before proceeding. With your approval, the Data Engineering Agent can use tools to perform tasks like searching tables, editing a SQL or Python source file, running pipeline updates, and reading pipeline datasets.
The Data Engineering Agent’s access and actions are governed by the user’s permissions. It can only access data that you have access to and perform operations that you have permissions for.
When you turn on agent mode in the Assistant, the Assistant adapts its capabilities based on the features you are currently using in Databricks. For example, in the Lakeflow Pipelines Editor, the Assistant focuses on pipeline editing and data engineering tasks. In notebooks and the SQL Editor, the assistant supports data exploration and analysis. See Data Science Agent for more information.
Requirements
To use the Data Engineering Agent, your workspace needs the following:
- Partner-powered AI features enabled for both the account and workspace. See Partner-powered AI features.
- Databricks Assistant Agent Mode preview enabled. See Manage Databricks previews.
Use the Data Engineering Agent
To use the Data Engineering Agent:
-
From Lakeflow Pipelines Editor, open the Assistant side panel by clicking
Assistant in the upper-right corner of your workspace.
-
In the lower-right corner, select Agent. This toggles on the Assistant’s agent mode, allowing you to interact with the Data Engineering Agent.
-
Enter a prompt for the agent. For example, you can ask it questions about your pipeline, such as “describe this pipeline”. You can also ask it to add new datasets, for example, "create silver_sales_data in a new file that reads from bronze_sales_data and cleans the data and adds useful quality expectations."
noteThe agent respects the user’s Unity Catalog permissions, so it can only access the data and pipeline source that you have access to.
-
As the agent generates its response, it often pauses to get your input:
-
For more complex tasks, the agent may create a step-by-step plan and ask clarifying questions. Answer the agent’s clarifying questions to help it hone its plan.
-
When the agent needs to run code or update a pipeline, it asks for your approval before proceeding. Allow or Decline its request. You can also select Allow in this thread (referring to the Assistant conversation thread) or Always allow.
importantThe Data Engineering Agent can generate and execute code in your pipeline. While it has guardrails to prevent dangerous actions, there is still risk. You should only use it with data you trust, and you should review code before running it.
-
As the agent continues its work, you may be prompted to select Continue or Reject. Review the agent’s existing work, then select Continue to allow the agent to continue to its next steps or Reject to tell it to try something else.
-
To stop the agent while it is working, click the red
.
-
The agent can create new files, generate text, queries, and code, run the files or pipelines, and access the output datasets to interpret the results.
In order for the Data Engineering Agent to continue its work and take next steps, you need to stay on the current tab the agent is working in.
You can add instructions for the agent to use in most responses. For example, if you have code conventions you want to use, or preferred libraries to use, you can add these guidelines to instructions for the agent. You can also create skills to extend the agent with specialized capabilities for your domain-specific tasks. For more details and other tips, see Customize and improve Databricks Assistant responses.
Capabilities
The Data Engineering Agent can help with most pipeline development tasks. Key capabilities include:
- Data discovery: The agent can search tables in the workspace to help you find the required data for a task.
- Pipeline code edits: The agent can create and edit multiple files at a time. It keeps you informed about which files it is changing and shows you the code diff in each file, so you can review the changes individually or all together at the end.
- Pipeline execution: The agent can run individual files, dry-run/run the pipeline, or do a full refresh. When the agent wants to proceed, it asks for your confirmation before doing so.
- Understanding and improving pipeline behavior: The agent can inspect datasets and pipeline outputs to help you understand what a pipeline is doing end-to-end and why. For example, it can summarize transformations, trace how data flows into downstream tables, and highlight unexpected changes in row counts or schemas. When it surfaces potential data quality issues, the agent can help you reason about their cause and suggest where and how to address them in the pipeline.
These capabilities support common use cases such as:
- Authoring a new pipeline: The Data Engineering Agent can help with all steps of creating a new medallion architecture pipeline, from ingesting data, to standardizing and cleaning the data, to transforming and analyzing the data.
- Explain a pipeline: The agent can analyze and explain an existing pipeline to help you ramp up quickly.
- Fix issues: When you have errors, the agent can help diagnose and fix the problems, iterating through multiple files until the issue is resolved.
Examples
Try the following prompts to get started:
- "Build and run a medallion architecture pipeline for fraud detection using the table transactions and customers in my_catalog.my_schema."
- "Explain every step of this pipeline."
- "Fix the failure in this pipeline."
Next steps
- Learn more about Databricks AI assistive features
- Get tips to Customize and improve Databricks Assistant responses
- Use the Data Science Agent, for data discovery and exploration
- Explore the Lakeflow Pipelines Editor