Skip to main content

Production observability with tracing

MLflow Tracing provides comprehensive observability for production GenAI apps deployed outside of Databricks by capturing execution details and sending them to your Databricks workspace where you can view them in the MLflow UI.

MLflow production tracing overview

How production tracing works:

  1. Your app generates traces - Each API call creates trace data
  2. Traces go to your Databricks MLflow tracking server - Using your workspace credentials
  3. View in MLflow UI - Analyze traces in your Databricks workspace

This page covers tracing apps deployed outside of Databricks. If your app is deployed using Databricks Model Serving, see Tracing with Databricks Model Serving.

Prerequisites

note

Production tracing requires MLflow 3. MLflow 2.x is not supported for production tracing.

Install the required packages. The following table describes your options:

Topic

mlflow-tracing

mlflow[databricks]

Recommended use case

Production deployments

Development and experimentation

Benefits

Minimal dependencies for lean, fast deployments

Performance optimized for high-volume tracing

Focused on client-side tracing for production monitoring

Full MLflow experimentation feature set (UI, LLM-as-a-judge, dev tools, and more)

Includes all development tools and utilities

Python
## Install mlflow-tracing for production deployment tracing
%pip install --upgrade mlflow-tracing

## Install mlflow for experimentation and development
%pip install --upgrade "mlflow[databricks]>=3.1"

Basic tracing setup

Configure your application deployment to connect to your Databricks workspace so Databricks can collect traces.

Configure the following environment variables:

Bash
# Required: Set the Databricks workspace host and authentication token
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-databricks-token"

# Required: Set MLflow Tracking URI to "databricks" to log to Databricks
export MLFLOW_TRACKING_URI=databricks

# Required: Configure the experiment name for organizing traces (must be a workspace path)
export MLFLOW_EXPERIMENT_NAME="/Shared/production-genai-app"

Deployment examples

After the environment variables are set, pass them to your application. Click on the tabs to see how to pass the connection details to different frameworks.

For Docker deployments, pass the environment variables through the container configuration:

Dockerfile
# Dockerfile
FROM python:3.9-slim

# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy application code
COPY . /app
WORKDIR /app

# Set default environment variables (can be overridden at runtime)
ENV DATABRICKS_HOST=""
ENV DATABRICKS_TOKEN=""
ENV MLFLOW_TRACKING_URI=databricks
ENV MLFLOW_EXPERIMENT_NAME="/Shared/production-genai-app"

CMD ["python", "app.py"]

Run the container with environment variables:

Bash
docker run -d \
-e DATABRICKS_HOST="https://your-workspace.cloud.databricks.com" \
-e DATABRICKS_TOKEN="your-databricks-token" \
-e MLFLOW_TRACKING_URI=databricks \
-e MLFLOW_EXPERIMENT_NAME="/Shared/production-genai-app" \
-e APP_VERSION="1.0.0" \
your-app:latest

Verify trace collection

After deploying your app, verify that traces are collected properly:

Python
import mlflow
from mlflow.client import MlflowClient
import os

# Ensure MLflow is configured for Databricks
mlflow.set_tracking_uri("databricks")

# Check connection to MLflow server
client = MlflowClient()
try:
# List recent experiments to verify connectivity
experiments = client.search_experiments()
print(f"Connected to MLflow. Found {len(experiments)} experiments.")

# Check if traces are being logged
traces = mlflow.search_traces(
experiment_names=[os.getenv("MLFLOW_EXPERIMENT_NAME", "/Shared/production-genai-app")],
max_results=5
)
print(f"Found {len(traces)} recent traces.")
except Exception as e:
print(f"Error connecting to MLflow: {e}")
print(f"Check your authentication and connectivity")

Add context to traces

After basic tracing works, add context for better debugging and insights. MLflow has the following standardized tags and attributes to capture important contextual information:

  • Request tracking - Link traces to specific API calls for end-to-end debugging
  • User sessions - Group related interactions to understand user journeys
  • Environment data - Track which deployment, version, or region generated each trace
  • User feedback - Collect quality ratings and link them to specific interactions

Track request, session, and user context

Production applications need to track multiple pieces of context simultaneously: client request IDs for debugging, session IDs for multi-turn conversations, user IDs for personalization and analytics, and environment metadata for operational insights. Here's a comprehensive example showing how to track all of these in a FastAPI application:

Python
import mlflow
import os
from fastapi import FastAPI, Request, HTTPException
from pydantic import BaseModel

# Initialize FastAPI app
app = FastAPI()

class ChatRequest(BaseModel):
message: str

@mlflow.trace # Ensure @mlflow.trace is the outermost decorator
@app.post("/chat") # FastAPI decorator should be inner
def handle_chat(request: Request, chat_request: ChatRequest):
# Retrieve all context from request headers
client_request_id = request.headers.get("X-Request-ID")
session_id = request.headers.get("X-Session-ID")
user_id = request.headers.get("X-User-ID")

# Update the current trace with all context and environment metadata
# The @mlflow.trace decorator ensures an active trace is available
mlflow.update_current_trace(
client_request_id=client_request_id,
tags={
# Session context - groups traces from multi-turn conversations
"mlflow.trace.session": session_id,
# User context - associates traces with specific users
"mlflow.trace.user": user_id,
# Environment metadata - tracks deployment context
"environment": "production",
"app_version": os.getenv("APP_VERSION", "1.0.0"),
"deployment_id": os.getenv("DEPLOYMENT_ID", "unknown"),
"region": os.getenv("REGION", "us-east-1")
}
)

# --- Your application logic for processing the chat message ---
# For example, calling a language model with context
# response_text = my_llm_call(
# message=chat_request.message,
# session_id=session_id,
# user_id=user_id
# )
response_text = f"Processed message: '{chat_request.message}'"
# --- End of application logic ---

# Return response
return {
"response": response_text
}

# To run this example (requires uvicorn and fastapi):
# uvicorn your_file_name:app --reload
#
# Example curl request with all context headers:
# curl -X POST "http://127.0.0.1:8000/chat" \
# -H "Content-Type: application/json" \
# -H "X-Request-ID: req-abc-123-xyz-789" \
# -H "X-Session-ID: session-def-456-uvw-012" \
# -H "X-User-ID: user-jane-doe-12345" \
# -d '{"message": "What is my account balance?"}'

This combined approach provides several benefits:

  • Client Request ID: Enables end-to-end debugging by correlating traces with specific client requests across your entire system
  • Session ID (tag: mlflow.trace.session): Groups traces from multi-turn conversations, allowing you to analyze the full conversational flow
  • User ID (tag: mlflow.trace.user): Associates traces with specific users for personalization, cohort analysis, and user-specific debugging
  • Environment metadata: Tracks deployment context (environment, version, region) for operational insights and debugging across different deployments

For more information on adding context to traces, see the documentation on tracking users & sessions and tracking environments & context.

Collect user feedback

Capturing user feedback on specific interactions is essential for understanding quality and improving your GenAI application. Building on the client request ID tracking shown in the previous section, this example demonstrates how to use that ID to link feedback to specific traces.

Here's an example of implementing feedback collection in FastAPI:

Python
import mlflow
from mlflow.client import MlflowClient
from fastapi import FastAPI, Query, Request
from pydantic import BaseModel
from typing import Optional
from mlflow.entities import AssessmentSource

# Initialize FastAPI app
app = FastAPI()

class FeedbackRequest(BaseModel):
is_correct: bool # True for correct, False for incorrect
comment: Optional[str] = None

@app.post("/chat_feedback")
def handle_chat_feedback(
request: Request,
client_request_id: str = Query(..., description="The client request ID from the original chat request"),
feedback: FeedbackRequest = ...
):
"""
Collect user feedback for a specific chat interaction identified by client_request_id.
"""
# Search for the trace with the matching client_request_id
client = MlflowClient()
# Get the experiment by name (using Databricks workspace path)
experiment = client.get_experiment_by_name("/Shared/production-app")
traces = client.search_traces(
experiment_ids=[experiment.experiment_id],
filter_string=f"attributes.client_request_id = '{client_request_id}'",
max_results=1
)

if not traces:
return {
"status": "error",
"message": f"Unable to find data for client request ID: {client_request_id}"
}, 500

# Log feedback using MLflow's log_feedback API
mlflow.log_feedback(
trace_id=traces[0].info.trace_id,
name="response_is_correct",
value=feedback.is_correct,
source=AssessmentSource(
source_type="HUMAN",
source_id=request.headers.get("X-User-ID")
),
rationale=feedback.comment
)

return {
"status": "success",
"message": "Feedback recorded successfully",
"trace_id": traces[0].info.trace_id,
"client_request_id": client_request_id,
"feedback_by": request.headers.get("X-User-ID")
}

# Example usage:
# After a chat interaction returns a response, the client can submit feedback:
#
# curl -X POST "http://127.0.0.1:8000/chat_feedback?client_request_id=req-abc-123-xyz-789" \
# -H "Content-Type: application/json" \
# -H "X-User-ID: user-jane-doe-12345" \
# -d '{
# "is_correct": true,
# "comment": "The response was accurate and helpful"
# }'

This feedback collection approach allows you to:

  • Link feedback to specific interactions: Use the client request ID to find the exact trace and attach feedback
  • Store structured feedback: The log_feedback API creates proper Assessment objects that are visible in the MLflow UI
  • Analyze quality patterns: Query traces with their associated feedback to identify what types of interactions receive positive or negative ratings

You can later query traces with feedback using the MLflow UI or programmatically to analyze patterns and improve your application.

Query traces with context

Use the contextual information to analyze production behavior:

Python
import mlflow
from mlflow.client import MlflowClient
import pandas as pd

client = MlflowClient()
experiment = client.get_experiment_by_name("/Shared/production-app")

# Query traces by user
user_traces = client.search_traces(
experiment_ids=[experiment.experiment_id],
filter_string="tags.`mlflow.trace.user` = 'user-jane-doe-12345'",
max_results=100
)

# Query traces by session
session_traces = client.search_traces(
experiment_ids=[experiment.experiment_id],
filter_string="tags.`mlflow.trace.session` = 'session-123'",
max_results=100
)

Next steps

Continue your journey with these recommended actions and tutorials.

Reference guides

Explore detailed documentation for concepts and features mentioned in this guide.