databricks-logo

    mlflow3-tool-calling-agent

    (Python)
    Loading...

    MLflow Tracing & Feedback with tool calling agents

    This short tutorial demonstrates how MLflow can capture detailed traces of a LangChain tool-calling agent as it solves mathematical problems, showcasing MLflow's ability to trace agent execution and store feedback about the agent's response. Feedback, which is logged using MLflow's log_feedback API, is very helpful for measuring and improving the quality of an agent.

    2
    %pip install -U mlflow-skinny[databricks] databricks-langchain langchain-community langchain databricks-ai-bridge
    3
    dbutils.library.restartPython()

    Define Mathematical Tools with MLflow Tracing

    In this section we define two mathematical tools (addition and multiplication) and instrument them using MLflow's tracing decorator. These functions log their inputs and outputs as MLflow spans (of type TOOL).

    5
    def add_numbers(input_str: str) -> str:
        """
        Adds two numbers provided as a space-separated string.
        Example input: "2 3"
        """
        a_str, b_str = input_str.split()
        result = int(a_str) + int(b_str)
        return str(result)
    
    def multiply_numbers(input_str: str) -> str:
        """
        Multiplies two numbers provided as a space-separated string.
        Example input: "4 5"
        """
        a_str, b_str = input_str.split()
        result = int(a_str) * int(b_str)
        return str(result)
    
    # Wrap these functions as LangChain Tools
    from langchain.agents import Tool
    
    tools = [
        Tool(name="Addition", func=add_numbers, description="Add two numbers. Input format: 'a b', where a and b are floats or integers"),
        Tool(name="Multiplication", func=multiply_numbers, description="Multiply two numbers. Input format: 'a b', where a and b are floats or integers")
    ]

    Define the LangChain Agent using a Real LLM

    We now configure a tool–calling agent using the Databricks LangChain community package. This agent uses the real LLM "databricks-meta-llama-3-3-70b-instruct" via ChatDatabricks. The agent uses a Zero-Shot ReAct strategy for reasoning.

    Note: Replace the endpoint parameter with your actual Databricks endpoint.

    from langchain.agents import initialize_agent
    from databricks_langchain import ChatDatabricks
    
    llm = ChatDatabricks(
        endpoint="databricks-meta-llama-3-3-70b-instruct",
        temperature=0.1,
    )
    
    agent = initialize_agent(tools, llm, agent="zero-shot-react-description", handle_parsing_errors=True)

    Scenario 1: LLM Reasoning with MLflow Tracing and Feedback - Simple mathematical problem

    In this scenario, the agent is expected to correctly solve a simple mathematical problem. The correct reasoning should produce the answer "14" for the expression "2 + 3 * 4" (multiplication first).

    Submit the problem to the agent and obtain its answer

    All operations that the agent performs to solve the problem are captured in an MLflow Trace.

    import mlflow
    
    # Enable MLflow Tracing for LangChain
    mlflow.langchain.autolog()
    
    problem = "What is 2 + 3 * 4?"
    answer = agent.run(problem)

    Assess the correctness of the agent's answer

    The agent's answer is checked, and corresponding correctness feedback (True = correct, False = incorrect) is added to the Trace as an MLflow Assessment.

    from mlflow.entities import AssessmentSource, AssessmentSourceType
    
    trace_id = mlflow.get_last_active_trace_id()
    if "14" in answer:
        mlflow.log_feedback(
            trace_id=trace_id,
            name="correctness",
            value=True,
            source=AssessmentSource(
                source_type=AssessmentSourceType.LLM_JUDGE,
                source_id="my-llm-judge-version-1"
            ),
            rationale="The answer 14 is correct for the given expression."
        )
        print(f"Logged correct feedback for trace {trace_id}")
    else:
        mlflow.log_feedback(
            trace_id=trace_id,
            name="correctness",
            value=False,
            source=AssessmentSource(
                source_type=AssessmentSourceType.LLM_JUDGE,
                source_id="my-llm-judge-version-1"
            ),
            rationale=f"The answer {answer} is incorrect. The correct answer is 14"
        )
        print(f"Logged correct feedback for trace {trace_id}")

    View feedback on the Trace

    All feedback on Traces can be accessed using the Trace.info.assessments Python property.

    mlflow.MlflowClient().get_trace(trace_id).info.assessments

    Scenario 2: LLM Reasoning with MLflow Tracing and Feedback - Complex mathematical problem

    In this scenario, the agent attempts to correctly solve a more complex mathematical problem. The correct reasoning should produce the answer "14706125" for the expression "((124 + 76) + (9 * 5)) ^ 3 = ?"

    Submit the problem to the agent and obtain its answer

    All operations that the agent performs to solve the problem are captured in an MLflow Trace.

    import mlflow
    
    # Enable MLflow Tracing for LangChain
    mlflow.langchain.autolog()
    
    problem = "((124 + 76) + (9 * 5)) ^ 3 = ?"
    answer = agent.run(problem)

    Assess the correctness of the agent's answer

    The agent's answer is checked, and corresponding correctness feedback (True = correct, False = incorrect) is added to the Trace as an MLflow Assessment.

    from mlflow.entities import AssessmentSource, AssessmentSourceType
    
    trace_id = mlflow.get_last_active_trace_id()
    if "14706125" in answer:
        mlflow.log_feedback(
            trace_id=trace_id,
            name="correctness",
            value=True,
            source=AssessmentSource(
                source_type=AssessmentSourceType.LLM_JUDGE,
                source_id="my-llm-judge-version-1"
            ),
            rationale="The answer 14706125 is correct for the given expression."
        )
        print(f"Logged correct feedback for trace {trace_id}")
    else:
        mlflow.log_feedback(
            trace_id=trace_id,
            name="correctness",
            value=False,
            source=AssessmentSource(
                source_type=AssessmentSourceType.LLM_JUDGE,
                source_id="my-llm-judge-version-1"
            ),
            rationale=f"The answer {answer} is incorrect. The correct answer is 14706125"
        )
        print(f"Logged correct feedback for trace {trace_id}")

    View feedback on the Trace

    All feedback on Traces can be accessed using the Trace.info.assessments Python property.

    mlflow.MlflowClient().get_trace(trace_id).info.assessments

    Conclusion

    In this tutorial, we:

    • Defined two mathematical tools (Addition and Multiplication) and instrumented them with MLflow tracing.
    • Configured a LangChain tool–calling agent using the real LLM "databricks-meta-llama-3-3-70b-instruct" via ChatDatabricks.
    • Demonstrated two scenarios: one where the agent is likely to produce the correct result (14) for a simple mathematical problem, and one where the agent may have more difficulty solving a more complex mathematical problem
    • Logged feedback about the correctness of the agent's answer using MLflow’s mlflow.log_feedback API for both scenarios.

    This setup provides end-to-end observability and evaluation for your LLM-based agents, allowing you to track and improve performance over time.

    ;