mlflow3-tool-calling-agent

2

%pip install -U mlflow-skinny[databricks] databricks-langchain langchain-community langchain databricks-ai-bridge

3

dbutils.library.restartPython()

5

def add_numbers(input_str: str) -> str:
    """
    Adds two numbers provided as a space-separated string.
    Example input: "2 3"
    """
    a_str, b_str = input_str.split()
    result = int(a_str) + int(b_str)
    return str(result)

def multiply_numbers(input_str: str) -> str:
    """
    Multiplies two numbers provided as a space-separated string.
    Example input: "4 5"
    """
    a_str, b_str = input_str.split()
    result = int(a_str) * int(b_str)
    return str(result)

# Wrap these functions as LangChain Tools
from langchain.agents import Tool

tools = [
    Tool(name="Addition", func=add_numbers, description="Add two numbers. Input format: 'a b', where a and b are floats or integers"),
    Tool(name="Multiplication", func=multiply_numbers, description="Multiply two numbers. Input format: 'a b', where a and b are floats or integers")
]

from langchain.agents import initialize_agent
from databricks_langchain import ChatDatabricks

llm = ChatDatabricks(
    endpoint="databricks-meta-llama-3-3-70b-instruct",
    temperature=0.1,
)

agent = initialize_agent(tools, llm, agent="zero-shot-react-description", handle_parsing_errors=True)

import mlflow

# Enable MLflow Tracing for LangChain
mlflow.langchain.autolog()

problem = "What is 2 + 3 * 4?"
answer = agent.run(problem)

from mlflow.entities import AssessmentSource, AssessmentSourceType

trace_id = mlflow.get_last_active_trace_id()
if "14" in answer:
    mlflow.log_feedback(
        trace_id=trace_id,
        name="correctness",
        value=True,
        source=AssessmentSource(
            source_type=AssessmentSourceType.LLM_JUDGE,
            source_id="my-llm-judge-version-1"
        ),
        rationale="The answer 14 is correct for the given expression."
    )
    print(f"Logged correct feedback for trace {trace_id}")
else:
    mlflow.log_feedback(
        trace_id=trace_id,
        name="correctness",
        value=False,
        source=AssessmentSource(
            source_type=AssessmentSourceType.LLM_JUDGE,
            source_id="my-llm-judge-version-1"
        ),
        rationale=f"The answer {answer} is incorrect. The correct answer is 14"
    )
    print(f"Logged correct feedback for trace {trace_id}")

mlflow.MlflowClient().get_trace(trace_id).info.assessments

import mlflow

# Enable MLflow Tracing for LangChain
mlflow.langchain.autolog()

problem = "((124 + 76) + (9 * 5)) ^ 3 = ?"
answer = agent.run(problem)

from mlflow.entities import AssessmentSource, AssessmentSourceType

trace_id = mlflow.get_last_active_trace_id()
if "14706125" in answer:
    mlflow.log_feedback(
        trace_id=trace_id,
        name="correctness",
        value=True,
        source=AssessmentSource(
            source_type=AssessmentSourceType.LLM_JUDGE,
            source_id="my-llm-judge-version-1"
        ),
        rationale="The answer 14706125 is correct for the given expression."
    )
    print(f"Logged correct feedback for trace {trace_id}")
else:
    mlflow.log_feedback(
        trace_id=trace_id,
        name="correctness",
        value=False,
        source=AssessmentSource(
            source_type=AssessmentSourceType.LLM_JUDGE,
            source_id="my-llm-judge-version-1"
        ),
        rationale=f"The answer {answer} is incorrect. The correct answer is 14706125"
    )
    print(f"Logged correct feedback for trace {trace_id}")

mlflow.MlflowClient().get_trace(trace_id).info.assessments

Conclusion

In this tutorial, we:

Defined two mathematical tools (Addition and Multiplication) and instrumented them with MLflow tracing.
Configured a LangChain tool–calling agent using the real LLM "databricks-meta-llama-3-3-70b-instruct" via ChatDatabricks.
Demonstrated two scenarios: one where the agent is likely to produce the correct result (14) for a simple mathematical problem, and one where the agent may have more difficulty solving a more complex mathematical problem
Logged feedback about the correctness of the agent's answer using MLflow’s mlflow.log_feedback API for both scenarios.

This setup provides end-to-end observability and evaluation for your LLM-based agents, allowing you to track and improve performance over time.

mlflow3-tool-calling-agent

MLflow Tracing & Feedback with tool calling agents

Define Mathematical Tools with MLflow Tracing

Define the LangChain Agent using a Real LLM

Scenario 1: LLM Reasoning with MLflow Tracing and Feedback - Simple mathematical problem

Submit the problem to the agent and obtain its answer

Assess the correctness of the agent's answer

View feedback on the Trace

Scenario 2: LLM Reasoning with MLflow Tracing and Feedback - Complex mathematical problem

Submit the problem to the agent and obtain its answer

Assess the correctness of the agent's answer

View feedback on the Trace

Conclusion