Skip to main content

Add MLflow Tracing to AI agents

MLflow Tracing on Databricks lets you monitor and analyze the behavior of your generative AI agents. You can enable tracing using the following methods:

  • Automatic tracing: Trace agents with a single line of code.
  • Manual tracing: Instrument your agents more granularly or trace properties that automatic tracing doesn't cover.
  • Combine automatic and manual tracing for flexible and convenient tracing.

Automatic tracing

MLflow autologging makes it simple to start tracing your agents. Just add the following line to your code:

Python
mlflow.<library>.autolog()

MLflow supports autologging for many popular agent authoring libraries including the following:

Library

Autologging version support

Autologging command

LangChain

0.1.0 ~ Latest

mlflow.langchain.autolog()

Langgraph

0.1.1 ~ Latest

mlflow.langgraph.autolog()

OpenAI

1.0.0 ~ Latest

mlflow.openai.autolog()

OpenAI Agents SDK

0.0.7 ~ Latest

mlflow.openai.autolog()

LlamaIndex

0.10.44 ~ Latest

mlflow.llama_index.autolog()

DSPy

2.5.17 ~ Latest

mlflow.dspy.autolog()

Amazon Bedrock

1.33.0 ~ Latest (boto3)

mlflow.bedrock.autolog()

Anthropic

0.30.0 ~ Latest

mlflow.anthropic.autolog()

AutoGen

0.2.36 ~ 0.2.40

mlflow.autogen.autolog()

Google Gemini

1.0.0 ~ Latest

mlflow.gemini.autolog()

CrewAI

0.80.0 ~ Latest

mlflow.crewai.autolog()

LiteLLM

1.52.9 ~ Latest

mlflow.litellm.autolog()

Groq

0.13.0 ~ Latest

mlflow.groq.autolog()

Mistral

1.0.0 ~ Latest

mlflow.mistral.autolog()

For a complete list of supported libraries and more information, see MLflow autologging documentation.

Disable autologging

Autologging tracing is enabled by default in Databricks Runtime 15.4 ML and above for LangChain, Langgraph, OpenAI, and LlamaIndex. To disable autologging, run the following command in a notebook:

Python
mlflow.<library>.autolog(log_traces=False)

Manual tracing

MLflow Tracing APIs let you manually add traces in case you want to instrument your agent more granularly or add additional traces that autologging doesn't capture.

MLflow Tracing APIs are low-code APIs for adding traces without having to manage the tree structure of the trace. MLflow determines the appropriate parent-child span relationships automatically using the Python stack.

Trace functions using the @mlflow.trace decorator

The simplest way to manually instrument your code is to decorate a function with the @mlflow.trace decorator. The MLflow trace decorator creates a span with the scope of the decorated function. The span records inputs, outputs, latency, and exceptions.

For example, the following code creates a span named my_function that captures input arguments x and y and the output.

Python
import mlflow

@mlflow.trace
def add(x: int, y: int) -> int:
return x + y

You can customize the span’s name, type, and add custom attributes:

Python
from mlflow.entities import SpanType

@mlflow.trace(
# By default, the function name is used as the span name. You can override it with the `name` parameter.
name="my_add_function",
# Specify the span type using the `span_type` parameter.
span_type=SpanType.TOOL,
# Add custom attributes to the span using the `attributes` parameter. By default, MLflow only captures input and output.
attributes={"key": "value"}
)
def add(x: int, y: int) -> int:
return x + y

Trace arbitrary code blocks using context manager

To create a span for an arbitrary block of code, not just a function, use mlflow.start_span() as a context manager that wraps the code block.

The span starts when the context is entered and ends when the context is exited. The span input and outputs should be provided manually using setter methods of the span object yielded by the context manager. For more information, see MLflow documentation - context handler.

Python
with mlflow.start_span(name="my_span") as span:
span.set_inputs({"x": x, "y": y})
result = x + y
span.set_outputs(result)
span.set_attribute("key", "value")

Lower-level tracing libraries

MLflow also provides low-level APIs to explicitly control the trace tree structure. See MLflow documentation - Manual Instrumentation.

Combine autologging and manual tracing

You can combine manual tracing and autologging. MLflow merges both types of spans into a single, complete trace.

The following example combines OpenAI autologging and manual tracing:

Python
import json
from openai import OpenAI
import mlflow
from mlflow.entities import SpanType

client = OpenAI()

# Enable OpenAI autologging to capture LLM API calls
# (*Not necessary if you are using the Databricks Runtime 15.4 ML and above, where OpenAI autologging is enabled by default.)
mlflow.openai.autolog()

# Define the tool function. Decorate it with `@mlflow.trace` to create a span for its execution.
@mlflow.trace(span_type=SpanType.TOOL)
def get_weather(city: str) -> str:
if city == "Tokyo":
return "sunny"
elif city == "Paris":
return "rainy"
return "unknown"


tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"parameters": {
"type": "object",
"properties": {"city": {"type": "string"}},
},
},
}
]

_tool_functions = {"get_weather": get_weather}

# Define a simple tool-calling agent
@mlflow.trace(span_type=SpanType.AGENT)
def run_tool_agent(question: str):
messages = [{"role": "user", "content": question}]

# Invoke the model with the given question and available tools
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
tools=tools,
)
ai_msg = response.choices[0].message
messages.append(ai_msg)

# If the model requests tool calls, invoke the function(s) with the specified arguments
if tool_calls := ai_msg.tool_calls:
for tool_call in tool_calls:
function_name = tool_call.function.name
if tool_func := _tool_functions.get(function_name):
args = json.loads(tool_call.function.arguments)
tool_result = tool_func(**args)
else:
raise RuntimeError("An invalid tool is returned from the assistant!")

messages.append(
{
"role": "tool",
"tool_call_id": tool_call.id,
"content": tool_result,
}
)

# Send the tool results to the model and get a new response
response = client.chat.completions.create(
model="gpt-4o-mini", messages=messages
)

return response.choices[0].message.content

# Run the tool calling agent
question = "What's the weather like in Paris today?"
answer = run_tool_agent(question)

Trace overhead latency

Traces are written asynchronously to minimize performance impact. However, tracing adds latency to endpoint response speed, especially when the trace is large. Test your endpoint to understand the impact before deploying to production.

The following table estimates latency impact by trace size:

Trace size per request

Impact to response speed latency (ms)

~10 KB

~ 1 ms

~ 1 MB

50 ~ 100 ms

10 MB

150 ms ~

Troubleshooting

For troubleshooting and common questions, see the MLflow documentation: Tracing How-to Guide and MLflow documentation: FAQ.