%md # MLflow Tracing & Feedback with tool calling agents This short tutorial demonstrates how MLflow can capture detailed traces of a LangChain tool-calling agent as it solves mathematical problems, showcasing MLflow's ability to trace agent execution and store feedback about the agent's response. Feedback, which is logged using MLflow's `log_feedback` API, is very helpful for measuring and improving the quality of an agent.
MLflow Tracing & Feedback with tool calling agents
This short tutorial demonstrates how MLflow can capture detailed traces of a LangChain tool-calling agent as it solves mathematical problems, showcasing MLflow's ability to trace agent execution and store feedback about the agent's response. Feedback, which is logged using MLflow's log_feedback
API, is very helpful for measuring and improving the quality of an agent.
%pip install -U mlflow-skinny[databricks] databricks-langchain langchain-community langchain databricks-ai-bridge
dbutils.library.restartPython()
%md ## Define Mathematical Tools with MLflow Tracing In this section we define two mathematical tools (addition and multiplication) and instrument them using MLflow's tracing decorator. These functions log their inputs and outputs as MLflow spans (of type TOOL).
Define Mathematical Tools with MLflow Tracing
In this section we define two mathematical tools (addition and multiplication) and instrument them using MLflow's tracing decorator. These functions log their inputs and outputs as MLflow spans (of type TOOL).
def add_numbers(input_str: str) -> str: """ Adds two numbers provided as a space-separated string. Example input: "2 3" """ a_str, b_str = input_str.split() result = int(a_str) + int(b_str) return str(result) def multiply_numbers(input_str: str) -> str: """ Multiplies two numbers provided as a space-separated string. Example input: "4 5" """ a_str, b_str = input_str.split() result = int(a_str) * int(b_str) return str(result) # Wrap these functions as LangChain Tools from langchain.agents import Tool tools = [ Tool(name="Addition", func=add_numbers, description="Add two numbers. Input format: 'a b', where a and b are floats or integers"), Tool(name="Multiplication", func=multiply_numbers, description="Multiply two numbers. Input format: 'a b', where a and b are floats or integers") ]
%md ## Define the LangChain Agent using a Real LLM We now configure a tool–calling agent using the Databricks LangChain community package. This agent uses the real LLM "databricks-meta-llama-3-3-70b-instruct" via ChatDatabricks. The agent uses a Zero-Shot ReAct strategy for reasoning. Note: Replace the endpoint parameter with your actual Databricks endpoint.
Define the LangChain Agent using a Real LLM
We now configure a tool–calling agent using the Databricks LangChain community package. This agent uses the real LLM "databricks-meta-llama-3-3-70b-instruct" via ChatDatabricks. The agent uses a Zero-Shot ReAct strategy for reasoning.
Note: Replace the endpoint parameter with your actual Databricks endpoint.
from langchain.agents import initialize_agent from databricks_langchain import ChatDatabricks llm = ChatDatabricks( endpoint="databricks-meta-llama-3-3-70b-instruct", temperature=0.1, ) agent = initialize_agent(tools, llm, agent="zero-shot-react-description", handle_parsing_errors=True)
%md ## Scenario 1: LLM Reasoning with MLflow Tracing and Feedback - Simple mathematical problem In this scenario, the agent is expected to correctly solve a simple mathematical problem. The correct reasoning should produce the answer "14" for the expression "2 + 3 * 4" (multiplication first).
Scenario 1: LLM Reasoning with MLflow Tracing and Feedback - Simple mathematical problem
In this scenario, the agent is expected to correctly solve a simple mathematical problem. The correct reasoning should produce the answer "14" for the expression "2 + 3 * 4" (multiplication first).
%md ### Submit the problem to the agent and obtain its answer All operations that the agent performs to solve the problem are captured in an MLflow Trace.
Submit the problem to the agent and obtain its answer
All operations that the agent performs to solve the problem are captured in an MLflow Trace.
import mlflow # Enable MLflow Tracing for LangChain mlflow.langchain.autolog() problem = "What is 2 + 3 * 4?" answer = agent.run(problem)
%md ### Assess the correctness of the agent's answer The agent's answer is checked, and corresponding `correctness` feedback (`True` = correct, `False` = incorrect) is added to the Trace as an MLflow Assessment.
Assess the correctness of the agent's answer
The agent's answer is checked, and corresponding correctness
feedback (True
= correct, False
= incorrect) is added to the Trace as an MLflow Assessment.
from mlflow.entities import AssessmentSource, AssessmentSourceType trace_id = mlflow.get_last_active_trace_id() if "14" in answer: mlflow.log_feedback( trace_id=trace_id, name="correctness", value=True, source=AssessmentSource( source_type=AssessmentSourceType.LLM_JUDGE, source_id="my-llm-judge-version-1" ), rationale="The answer 14 is correct for the given expression." ) print(f"Logged correct feedback for trace {trace_id}") else: mlflow.log_feedback( trace_id=trace_id, name="correctness", value=False, source=AssessmentSource( source_type=AssessmentSourceType.LLM_JUDGE, source_id="my-llm-judge-version-1" ), rationale=f"The answer {answer} is incorrect. The correct answer is 14" ) print(f"Logged correct feedback for trace {trace_id}")
%md ### View feedback on the Trace All feedback on Traces can be accessed using the [Trace.info.assessments](https://mlflow.org/docs/latest/api_reference/python_api/mlflow.entities.html#mlflow.entities.TraceInfo.assessments) Python property.
View feedback on the Trace
All feedback on Traces can be accessed using the Trace.info.assessments Python property.
mlflow.MlflowClient().get_trace(trace_id).info.assessments
%md ## Scenario 2: LLM Reasoning with MLflow Tracing and Feedback - Complex mathematical problem In this scenario, the agent attempts to correctly solve a more complex mathematical problem. The correct reasoning should produce the answer "14706125" for the expression "((124 + 76) + (9 * 5)) ^ 3 = ?"
Scenario 2: LLM Reasoning with MLflow Tracing and Feedback - Complex mathematical problem
In this scenario, the agent attempts to correctly solve a more complex mathematical problem. The correct reasoning should produce the answer "14706125" for the expression "((124 + 76) + (9 * 5)) ^ 3 = ?"
%md ### Submit the problem to the agent and obtain its answer All operations that the agent performs to solve the problem are captured in an MLflow Trace.
Submit the problem to the agent and obtain its answer
All operations that the agent performs to solve the problem are captured in an MLflow Trace.
import mlflow # Enable MLflow Tracing for LangChain mlflow.langchain.autolog() problem = "((124 + 76) + (9 * 5)) ^ 3 = ?" answer = agent.run(problem)
%md ### Assess the correctness of the agent's answer The agent's answer is checked, and corresponding `correctness` feedback (`True` = correct, `False` = incorrect) is added to the Trace as an MLflow Assessment.
Assess the correctness of the agent's answer
The agent's answer is checked, and corresponding correctness
feedback (True
= correct, False
= incorrect) is added to the Trace as an MLflow Assessment.
from mlflow.entities import AssessmentSource, AssessmentSourceType trace_id = mlflow.get_last_active_trace_id() if "14706125" in answer: mlflow.log_feedback( trace_id=trace_id, name="correctness", value=True, source=AssessmentSource( source_type=AssessmentSourceType.LLM_JUDGE, source_id="my-llm-judge-version-1" ), rationale="The answer 14706125 is correct for the given expression." ) print(f"Logged correct feedback for trace {trace_id}") else: mlflow.log_feedback( trace_id=trace_id, name="correctness", value=False, source=AssessmentSource( source_type=AssessmentSourceType.LLM_JUDGE, source_id="my-llm-judge-version-1" ), rationale=f"The answer {answer} is incorrect. The correct answer is 14706125" ) print(f"Logged correct feedback for trace {trace_id}")
%md ### View feedback on the Trace All feedback on Traces can be accessed using the [Trace.info.assessments](https://mlflow.org/docs/latest/api_reference/python_api/mlflow.entities.html#mlflow.entities.TraceInfo.assessments) Python property.
View feedback on the Trace
All feedback on Traces can be accessed using the Trace.info.assessments Python property.
mlflow.MlflowClient().get_trace(trace_id).info.assessments
%md ## Conclusion In this tutorial, we: - Defined two mathematical tools (Addition and Multiplication) and instrumented them with MLflow tracing. - Configured a LangChain tool–calling agent using the real LLM "databricks-meta-llama-3-3-70b-instruct" via ChatDatabricks. - Demonstrated two scenarios: one where the agent is likely to produce the correct result (14) for a simple mathematical problem, and one where the agent may have more difficulty solving a more complex mathematical problem - Logged feedback about the correctness of the agent's answer using MLflow’s `mlflow.log_feedback` API for both scenarios. This setup provides end-to-end observability and evaluation for your LLM-based agents, allowing you to track and improve performance over time.
Conclusion
In this tutorial, we:
- Defined two mathematical tools (Addition and Multiplication) and instrumented them with MLflow tracing.
- Configured a LangChain tool–calling agent using the real LLM "databricks-meta-llama-3-3-70b-instruct" via ChatDatabricks.
- Demonstrated two scenarios: one where the agent is likely to produce the correct result (14) for a simple mathematical problem, and one where the agent may have more difficulty solving a more complex mathematical problem
- Logged feedback about the correctness of the agent's answer using MLflow’s
mlflow.log_feedback
API for both scenarios.
This setup provides end-to-end observability and evaluation for your LLM-based agents, allowing you to track and improve performance over time.