Skip to main content

Tracing Claude Code

MLflow Tracing automatically traces Claude Code conversations and agents authored using Claude Agent SDK, capturing user prompts, AI responses, tool usage, timing, and session metadata.

MLflow supports two approaches for Claude Code tracing:

  • CLI tracing: Configure tracing through the MLflow CLI to automatically trace interactive Claude Code sessions (MLflow 3.4+)
  • SDK tracing: Enable tracing programmatically for Python applications using the Claude Agent SDK (MLflow 3.5+)

Requirements

Claude Agent SDK tracing requires:

Bash
pip install --upgrade "mlflow[databricks]>=3.5" "claude-agent>=0.1.0"

Trace Claude Code to Databricks

  1. Set Databricks and Anthropic environment variables:

    Bash
    export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
    export DATABRICKS_TOKEN="your-personal-access-token"
    export ANTHROPIC_API_KEY="your-anthropic-api-key"

    For production environments, use Mosaic AI Gateway or Databricks secrets for secure API key management.

  2. Enable autologging for Claude Agent SDK to trace all Claude Agent SDK interactions:

    note

    MLflow does not support tracing direct calls to query. MLflow only supports tracing interactions that use ClaudeSDKClient.

    Python
    import asyncio
    import mlflow.anthropic
    from claude_agent_sdk import ClaudeSDKClient

    # Enable autologging
    mlflow.anthropic.autolog()

    # Optionally configure MLflow experiment
    mlflow.set_experiment("my_claude_app")


    async def main():
    async with ClaudeSDKClient() as client:
    await client.query("What is the capital of France?")

    async for message in client.receive_response():
    print(message)


    if __name__ == "__main__":
    asyncio.run(main())

    To disable autologging, call mlflow.anthropic.autolog(disable=True).

  3. View your traces in the MLflow experiment UI in your Databricks workspace.

Advanced: SDK tracing with evaluation

You can use SDK tracing with MLflow's GenAI evaluation framework:

Python
import asyncio
import pandas as pd
from claude_agent_sdk import ClaudeSDKClient

import mlflow.anthropic
from mlflow.genai import evaluate, scorer
from mlflow.genai.judges import make_judge

mlflow.anthropic.autolog()


async def run_agent(query: str) -> str:
"""Run Claude Agent SDK and return response"""
async with ClaudeSDKClient() as client:
await client.query(query)

response_text = ""
async for message in client.receive_response():
response_text += str(message) + "\n\n"

return response_text


def predict_fn(query: str) -> str:
"""Synchronous wrapper for evaluation"""
return asyncio.run(run_agent(query))


relevance = make_judge(
name="relevance",
instructions=(
"Evaluate if the response in {{ outputs }} is relevant to "
"the question in {{ inputs }}. Return either 'pass' or 'fail'."
),
model="openai:/gpt-4o",
)

# Create evaluation dataset
eval_data = pd.DataFrame(
[
{"inputs": {"query": "What is machine learning?"}},
{"inputs": {"query": "Explain neural networks"}},
]
)

# Run evaluation with automatic tracing
mlflow.set_experiment("claude_evaluation")
evaluate(data=eval_data, predict_fn=predict_fn, scorers=[relevance])

Troubleshooting

Missing traces:

  • Verify mlflow.anthropic.autolog() is called before creating the ClaudeSDKClient
  • Check that the environment variables (DATABRICKS_HOST, DATABRICKS_TOKEN) are set correctly
  • Verify your Databricks token has not expired