MLflow 3.0 generative AI workflow (Beta)

Beta

This feature is in Beta.

Example notebook

The example notebook creates an agent, logs it to MLflow, and evaluates its performance. The notebook installs the databricks-langchain, databricks-agents, uv, and langgraph==0.3.4 libraries. This notebook is an adaption of the LangGraph tool-calling agent notebook. For additional example notebooks and more details on how to create agents, see Author AI agents in code.

MLflow 3.0 build and evaluate a generative AI agent notebook

Open notebook in new tab

Explore agent performance using the MLflow UI

The notebook code runs an initial test, querying the chain to make sure that it performs well enough for further testing. Traces from this test are available in the MLflow UI on the model details page, on the Traces tab.

Traces tab of model page showing details of multiple traces.

The evaluation run in the example notebook produces two MLflow runs, one MLflow LoggedModel, and traces from the interactive query and evaluation. All of the evaluation metrics for the agent are displayed on the model details page of the MLflow experiment.

MLflow UI models page for the chain, showing metrics and parameters.

The model ID and model parameters and metrics are also displayed on the Unity Catalog model version page.

Model version page in Unity Catalog showing metrics from multiple runs.

You can now use Mosaic AI Model Serving to serve this model to a live endpoint that can be embedded into any application. Traces from online invocations of the serving endpoint are also recorded in the Traces tab of the Unity Catalog model version, along with any offline traces from the development phase. See Deploy an agent for generative AI applications.

Traces tab of model version page in Unity Catalog showing details of multiple traces.

MLflow tracing and feedback with tool-calling agents

This short tutorial demonstrates how MLflow can capture detailed traces of a LangChain tool-calling agent as it solves mathematical problems. It illustrates the ability of MLflow to trace agent execution and store feedback about the agent's response. Feedback, which is logged using the log_feedback API, is very helpful for measuring and improving the quality of an agent.

MLflow 3.0 tracing and feedback with a tool-calling agent notebook

Open notebook in new tab

More information

See the following pages for additional information:

What's the difference between the Models tab on the MLflow experiment page and the model version page in Catalog Explorer?

The Models tab of the experiment page and the model version page in Catalog Explorer show similar information about the model. The two views have different roles in the model development and deployment lifecycle.

The Models tab of the experiment page presents the results of logged models from an experiment on a single page. The Charts tab on this page provides visualizations to help you compare models and select the model versions to register to Unity Catalog for possible deployment.
In Catalog Explorer, the model version page provides an overview of all model performance and evaluation results. This page shows model parameters, metrics, and traces across all linked environments including different workspaces, endpoints, and experiments. This is useful for monitoring and deployment, and works especially well with deployment jobs. The evaluation task in a deployment job creates additional metrics that appear on this page. The approver for the job can then review this page to assess whether to approve the model version for deployment.

Example notebook​

MLflow 3.0 build and evaluate a generative AI agent notebook

Explore agent performance using the MLflow UI​

MLflow tracing and feedback with tool-calling agents​

MLflow 3.0 tracing and feedback with a tool-calling agent notebook

More information​

What's the difference between the Models tab on the MLflow experiment page and the model version page in Catalog Explorer?​

Example notebook

Explore agent performance using the MLflow UI

MLflow tracing and feedback with tool-calling agents

More information

What's the difference between the Models tab on the MLflow experiment page and the model version page in Catalog Explorer?