MLflow Tracing for agents

Important

This feature is in Public Preview.

This article describes MLflow Tracing and the scenarios where it is helpful for evaluating generative AI applications in your AI system.

In software development, tracing involves recording sequences of events like user sessions or request flows. In the context of AI systems, tracing often refers to interactions you have with an AI system. An example trace of an AI system might look like instrumenting the inputs and parameters for a RAG application that includes a user message with prompt, a vector lookup, and an interface with the generative AI model.

What is MLflow Tracing?

Using MLflow Tracing you can log, analyze, and compare traces across different versions of generative AI applications. It allows you to debug your generative AI Python code and keep track of inputs and responses, aiding you in discovering conditions or parameters that contribute to poor performance of your application. MLflow Tracing is tightly integrated with Databricks tools and infrastructure, allowing you to store and display all your traces in Databricks notebooks or the MLflow Experiment UI as you run your code.

When you develop AI systems on Databricks using LangChain or PyFunc, MLflow Tracing allows you to see all the events and intermediate outputs from each step of your agent. You can easily see the prompts, which models and retrievers were used, which documents were retrieved to augment the response, how long things took, and the final output. So, if your model hallucinates, you can quickly inspect each step that led to the hallucination.

Why use MLflow Tracing?

MLflow Tracing provides the following benefits to help you track your development workflow.

  • Interactive trace visualization and investigation tool for diagnosing issues in development.

  • Verify that prompt templates and guardrails are producing reasonable results.

  • Explore and minimize the latency impact of different frameworks, models, chunk sizes, and software development practices.

  • Measure application costs by tracking token usage by different models.

  • Establish benchmark (“golden”) datasets to evaluate the performance of different versions.

  • Store traces for offline review and evaluation. This requires a serving endpoint that is configured to use inference tables.

Install MLflow Tracing

MLflow Tracing is available in MLflow versions 2.13.0 and above.

Alternatively, you can %pip install databricks-agents to install the latest version of databricks-agents that includes a compatible MLflow version.

Use MLflow Tracing

You can use MLflow Tracing in your agent development workloads in the following ways:

  • Use the MLflow Tracing integration with LangChain, mlflow.langchain. You can run mlflow.langchain.autolog on your agent and run the invocation API to automatically see traces for each step of your agent.

  • If you prefer, you can also manually add traces to specific parts of your agent using the Fluent APIs or MLflow Client APIs.

Add traces to your agents

MLflow Tracing provides two different ways to use traces with your generative AI application with traces. See Add traces to your agents.

API

Description

Fluent APIs

(Recommended) Low-code APIs for instrumenting AI systems without worrying about the tree structure of the trace. MLflow determines the appropriate parent-child tree structure (spans) based on the Python stack.

MLflow Client APIs

MLflowClient implements more granular, thread-safe APIs for advanced use cases. These APIs don’t manage the parent-child relationship of the spans, so you need to manually specify it to construct the desired trace structure. This requires more code but gives you better control over the trace lifecycle, particularly for multi-threaded use cases.

Recommended for use cases that require more control, such as multi-threaded applications or callback-based instrumentation.

For API reference and code examples, see the MLflow documentation.

Enable inference tables to collect traces

To log traces in an inference table, you must set the ENABLE_MLFLOW_TRACING environment variable in your serving endpoint configuration to True. See Add plain text environment variables. If you deployed your agent using the deploy() API, traces are automatically logged in an inference table. See Deploy an agent using deploy().

Note

Enabling tracing might introduce some overhead to the endpoint response speed, particularly when the trace size for each inference request is large. Databricks does not guarantee any service level agreement (SLA) for the actual latency impact on your model endpoint, as it heavily depends on the environment and the model implementation. Databricks recommends testing your endpoint performance and gaining insights into the tracing overhead before deploying to a production application.

The following table provides a rough indication of the impact on inference latency for different trace sizes.

Trace size per request

Impact to latency (ms)

~10 KB

~ 1 ms

~ 1 MB

50 ~ 100 ms

10 MB

150 ms ~

Limitations

  • MLflow Tracing is available in Databricks notebooks, notebook jobs, and Model Serving.

  • LangChain autologging may not support all LangChain prediction APIs. Please refer to the MLflow documentation for the full list of supported APIs.