Add MLflow Tracing to AI agents
MLflow Tracing on Databricks lets you monitor and analyze the behavior of your generative AI agents. You can enable tracing using the following methods:
- Automatic tracing: Trace agents with a single line of code.
- Manual tracing: Instrument your agents more granularly or trace properties that automatic tracing doesn't cover.
- Combine automatic and manual tracing for flexible and convenient tracing.
Automatic tracing
MLflow autologging makes it simple to start tracing your agents. Just add the following line to your code:
mlflow.<library>.autolog()
MLflow supports autologging for many popular agent authoring libraries including the following:
Library | Autologging version support | Autologging command |
---|---|---|
LangChain | 0.1.0 ~ Latest |
|
Langgraph | 0.1.1 ~ Latest |
|
OpenAI | 1.0.0 ~ Latest |
|
OpenAI Agents SDK | 0.0.7 ~ Latest |
|
LlamaIndex | 0.10.44 ~ Latest |
|
DSPy | 2.5.17 ~ Latest |
|
Amazon Bedrock | 1.33.0 ~ Latest (boto3) |
|
Anthropic | 0.30.0 ~ Latest |
|
AutoGen | 0.2.36 ~ 0.2.40 |
|
Google Gemini | 1.0.0 ~ Latest |
|
CrewAI | 0.80.0 ~ Latest |
|
LiteLLM | 1.52.9 ~ Latest |
|
Groq | 0.13.0 ~ Latest |
|
Mistral | 1.0.0 ~ Latest |
|
For a complete list of supported libraries and more information, see MLflow autologging documentation.
Disable autologging
Autologging tracing is enabled by default in Databricks Runtime 15.4 ML and above for LangChain, Langgraph, OpenAI, and LlamaIndex. To disable autologging, run the following command in a notebook:
mlflow.<library>.autolog(log_traces=False)
Manual tracing
MLflow Tracing APIs let you manually add traces in case you want to instrument your agent more granularly or add additional traces that autologging doesn't capture.
MLflow Tracing APIs are low-code APIs for adding traces without having to manage the tree structure of the trace. MLflow determines the appropriate parent-child span relationships automatically using the Python stack.
Trace functions using the @mlflow.trace
decorator
The simplest way to manually instrument your code is to decorate a function with the @mlflow.trace
decorator. The MLflow trace decorator creates a span with the scope of the decorated function. The span records inputs, outputs, latency, and exceptions.
For example, the following code creates a span named my_function
that captures input arguments x
and y
and the output.
import mlflow
@mlflow.trace
def add(x: int, y: int) -> int:
return x + y
You can customize the span’s name, type, and add custom attributes:
from mlflow.entities import SpanType
@mlflow.trace(
# By default, the function name is used as the span name. You can override it with the `name` parameter.
name="my_add_function",
# Specify the span type using the `span_type` parameter.
span_type=SpanType.TOOL,
# Add custom attributes to the span using the `attributes` parameter. By default, MLflow only captures input and output.
attributes={"key": "value"}
)
def add(x: int, y: int) -> int:
return x + y
Trace arbitrary code blocks using context manager
To create a span for an arbitrary block of code, not just a function, use mlflow.start_span()
as a context manager that wraps the code block.
The span starts when the context is entered and ends when the context is exited. The span input and outputs should be provided manually using setter methods of the span object yielded by the context manager. For more information, see MLflow documentation - context handler.
with mlflow.start_span(name="my_span") as span:
span.set_inputs({"x": x, "y": y})
result = x + y
span.set_outputs(result)
span.set_attribute("key", "value")
Lower-level tracing libraries
MLflow also provides low-level APIs to explicitly control the trace tree structure. See MLflow documentation - Manual Instrumentation.
Combine autologging and manual tracing
You can combine manual tracing and autologging. MLflow merges both types of spans into a single, complete trace.
The following example combines OpenAI autologging and manual tracing:
import json
from openai import OpenAI
import mlflow
from mlflow.entities import SpanType
client = OpenAI()
# Enable OpenAI autologging to capture LLM API calls
# (*Not necessary if you are using the Databricks Runtime 15.4 ML and above, where OpenAI autologging is enabled by default.)
mlflow.openai.autolog()
# Define the tool function. Decorate it with `@mlflow.trace` to create a span for its execution.
@mlflow.trace(span_type=SpanType.TOOL)
def get_weather(city: str) -> str:
if city == "Tokyo":
return "sunny"
elif city == "Paris":
return "rainy"
return "unknown"
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
},
},
}
]
_tool_functions = {"get_weather": get_weather}
# Define a simple tool-calling agent
@mlflow.trace(span_type=SpanType.AGENT)
def run_tool_agent(question: str):
messages = [{"role": "user", "content": question}]
# Invoke the model with the given question and available tools
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
)
ai_msg = response.choices[0].message
messages.append(ai_msg)
# If the model requests tool calls, invoke the function(s) with the specified arguments
if tool_calls := ai_msg.tool_calls:
for tool_call in tool_calls:
function_name = tool_call.function.name
if tool_func := _tool_functions.get(function_name):
args = json.loads(tool_call.function.arguments)
tool_result = tool_func(**args)
else:
raise RuntimeError("An invalid tool is returned from the assistant!")
messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": tool_result,
}
)
# Send the tool results to the model and get a new response
response = client.chat.completions.create(
model="gpt-4o-mini", messages=messages
)
return response.choices[0].message.content
# Run the tool calling agent
question = "What's the weather like in Paris today?"
answer = run_tool_agent(question)
Trace overhead latency
Traces are written asynchronously to minimize performance impact. However, tracing adds latency to endpoint response speed, especially when the trace is large. Test your endpoint to understand the impact before deploying to production.
The following table estimates latency impact by trace size:
Trace size per request | Impact to response speed latency (ms) |
---|---|
~10 KB | ~ 1 ms |
~ 1 MB | 50 ~ 100 ms |
10 MB | 150 ms ~ |
Troubleshooting
For troubleshooting and common questions, see the MLflow documentation: Tracing How-to Guide and MLflow documentation: FAQ.