Skip to main content

Production observability with tracing

When moving your GenAI application from development to production, tracing plays a critical role in ensuring visibility, reliability, and quality in your live environment. This guide covers production-specific considerations for MLflow Tracing, including obtaining and using trace IDs, configuring for performance, and integrating with broader observability systems.

Prerequisites

For production deployments, it is highly recommended to install the mlflow-tracing package:

Python
%pip install --upgrade mlflow-tracing

This package is specifically optimized for production environments, offering:

  • Minimal dependencies for faster, leaner deployments.
  • Performance optimizations for high-volume tracing.

While mlflow-tracing is recommended for production, you might use the full mlflow package during development:

Bash
# For development (provides all MLflow features)
# pip install --upgrade "mlflow[databricks]>=3.1"

The full mlflow[databricks] package (e.g., mlflow[databricks]>=3.1) includes all MLflow features for experimentation, such as the UI, LLM-as-a-judge evaluations, and more, along with development tools and utilities, and ensures connectivity with Databricks. In contrast, mlflow-tracing is a leaner version focused solely on the client-side tracing capabilities needed for production monitoring.

note

MLflow 3 (specifically mlflow-tracing for production) is required for production tracing. MLflow 2.x is not supported for production deployments due to performance limitations and missing features essential for production use.

Setting up tracing for production endpoints

When deploying your GenAI application to production outside of Databricks, you need to configure MLflow Tracing to send traces to your MLflow tracking server. This involves setting the appropriate environment variables and ensuring your application has the necessary permissions.

Environment variable configuration

Configure the following environment variables in your production environment:

Bash
# Required: Set the Databricks workspace host and authentication token
export DATABRICKS_HOST="https://your-workspace.cloud.databricks.com"
export DATABRICKS_TOKEN="your-databricks-token"

# Required: Set MLflow Tracking URI to "databricks" to log to Databricks
export MLFLOW_TRACKING_URI=databricks

# Required: Configure the experiment name for organizing traces (must be a workspace path)
export MLFLOW_EXPERIMENT_NAME="/Shared/production-genai-app"

Docker deployment example

When deploying with Docker, pass environment variables through your container configuration:

Dockerfile
# Dockerfile
FROM python:3.9-slim

# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy application code
COPY . /app
WORKDIR /app

# Set default environment variables (can be overridden at runtime)
ENV DATABRICKS_HOST=""
ENV DATABRICKS_TOKEN=""
ENV MLFLOW_TRACKING_URI=databricks
ENV MLFLOW_EXPERIMENT_NAME="/Shared/production-genai-app"

CMD ["python", "app.py"]

Run the container with environment variables:

Bash
docker run -d \
-e DATABRICKS_HOST="https://your-workspace.cloud.databricks.com" \
-e DATABRICKS_TOKEN="your-databricks-token" \
-e MLFLOW_TRACKING_URI=databricks \
-e MLFLOW_EXPERIMENT_NAME="/Shared/production-genai-app" \
-e APP_VERSION="1.0.0" \
your-app:latest

Kubernetes deployment example

For Kubernetes deployments, use ConfigMaps and Secrets:

YAML
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: databricks-config
data:
DATABRICKS_HOST: 'https://your-workspace.cloud.databricks.com'
MLFLOW_TRACKING_URI: databricks
MLFLOW_EXPERIMENT_NAME: '/Shared/production-genai-app'

---
# secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: databricks-secrets
type: Opaque
stringData:
DATABRICKS_TOKEN: 'your-databricks-token'

---
# deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: genai-app
spec:
template:
spec:
containers:
- name: app
image: your-app:latest
envFrom:
- configMapRef:
name: databricks-config
- secretRef:
name: databricks-secrets
env:
- name: APP_VERSION
value: '1.0.0'

Verifying trace collection

After deployment, verify that traces are being collected properly:

Python
import mlflow
from mlflow.client import MlflowClient
import os

# Ensure MLflow is configured for Databricks
mlflow.set_tracking_uri("databricks")

# Check connection to MLflow server
client = MlflowClient()
try:
# List recent experiments to verify connectivity
experiments = client.search_experiments()
print(f"Connected to MLflow. Found {len(experiments)} experiments.")

# Check if traces are being logged
traces = mlflow.search_traces(
experiment_names=[os.getenv("MLFLOW_EXPERIMENT_NAME", "/Shared/production-genai-app")],
max_results=5
)
print(f"Found {len(traces)} recent traces.")
except Exception as e:
print(f"Error connecting to MLflow: {e}")
print(f"Check your authentication and connectivity")

Adding context to production traces

In production environments, enriching traces with contextual information is crucial for understanding user behavior, debugging issues, and improving your GenAI application. MLflow provides standardized tags and attributes to capture important production context.

Tracking request, session, and user context and metadata

Production applications need to track multiple pieces of context simultaneously: client request IDs for debugging, session IDs for multi-turn conversations, user IDs for personalization and analytics, and environment metadata for operational insights. Here's a comprehensive example showing how to track all of these in a FastAPI application:

Python
import mlflow
import os
from fastapi import FastAPI, Request, HTTPException
from pydantic import BaseModel

# Initialize FastAPI app
app = FastAPI()

class ChatRequest(BaseModel):
message: str

@mlflow.trace # Ensure @mlflow.trace is the outermost decorator
@app.post("/chat") # FastAPI decorator should be inner
def handle_chat(request: Request, chat_request: ChatRequest):
# Retrieve all context from request headers
client_request_id = request.headers.get("X-Request-ID")
session_id = request.headers.get("X-Session-ID")
user_id = request.headers.get("X-User-ID")

# Update the current trace with all context and environment metadata
# The @mlflow.trace decorator ensures an active trace is available
mlflow.update_current_trace(
client_request_id=client_request_id,
tags={
# Session context - groups traces from multi-turn conversations
"mlflow.trace.session": session_id,
# User context - associates traces with specific users
"mlflow.trace.user": user_id,
# Environment metadata - tracks deployment context
"environment": "production",
"app_version": os.getenv("APP_VERSION", "1.0.0"),
"deployment_id": os.getenv("DEPLOYMENT_ID", "unknown"),
"region": os.getenv("REGION", "us-east-1")
}
)

# --- Your application logic for processing the chat message ---
# For example, calling a language model with context
# response_text = my_llm_call(
# message=chat_request.message,
# session_id=session_id,
# user_id=user_id
# )
response_text = f"Processed message: '{chat_request.message}'"
# --- End of application logic ---

# Return response
return {
"response": response_text
}

# To run this example (requires uvicorn and fastapi):
# uvicorn your_file_name:app --reload
#
# Example curl request with all context headers:
# curl -X POST "http://127.0.0.1:8000/chat" \
# -H "Content-Type: application/json" \
# -H "X-Request-ID: req-abc-123-xyz-789" \
# -H "X-Session-ID: session-def-456-uvw-012" \
# -H "X-User-ID: user-jane-doe-12345" \
# -d '{"message": "What is my account balance?"}'

This combined approach provides several benefits:

  • Client Request ID: Enables end-to-end debugging by correlating traces with specific client requests across your entire system
  • Session ID (tag: mlflow.trace.session): Groups traces from multi-turn conversations, allowing you to analyze the full conversational flow
  • User ID (tag: mlflow.trace.user): Associates traces with specific users for personalization, cohort analysis, and user-specific debugging
  • Environment metadata: Tracks deployment context (environment, version, region) for operational insights and debugging across different deployments

For more information on adding context to traces, see the documentation on tracking users & sessions and tracking environments & context.

Feedback collection

Capturing user feedback on specific interactions is essential for understanding quality and improving your GenAI application. Building on the client request ID tracking shown in the previous section, this example demonstrates how to use that ID to link feedback to specific traces.

Here's an example of implementing feedback collection in FastAPI:

Python
import mlflow
from mlflow.client import MlflowClient
from fastapi import FastAPI, Query, Request
from pydantic import BaseModel
from typing import Optional
from mlflow.entities import AssessmentSource

# Initialize FastAPI app
app = FastAPI()

class FeedbackRequest(BaseModel):
is_correct: bool # True for correct, False for incorrect
comment: Optional[str] = None

@app.post("/chat_feedback")
def handle_chat_feedback(
request: Request,
client_request_id: str = Query(..., description="The client request ID from the original chat request"),
feedback: FeedbackRequest = ...
):
"""
Collect user feedback for a specific chat interaction identified by client_request_id.
"""
# Search for the trace with the matching client_request_id
client = MlflowClient()
# Get the experiment by name (using Databricks workspace path)
experiment = client.get_experiment_by_name("/Shared/production-app")
traces = client.search_traces(
experiment_ids=[experiment.experiment_id],
filter_string=f"attributes.client_request_id = '{client_request_id}'",
max_results=1
)

if not traces:
return {
"status": "error",
"message": f"Unable to find data for client request ID: {client_request_id}"
}, 500

# Log feedback using MLflow's log_feedback API
mlflow.log_feedback(
trace_id=traces[0].info.trace_id,
name="response_is_correct",
value=feedback.is_correct,
source=AssessmentSource(
source_type="HUMAN",
source_id=request.headers.get("X-User-ID")
),
rationale=feedback.comment
)

return {
"status": "success",
"message": "Feedback recorded successfully",
"trace_id": traces[0].info.trace_id,
"client_request_id": client_request_id,
"feedback_by": request.headers.get("X-User-ID")
}

# Example usage:
# After a chat interaction returns a response, the client can submit feedback:
#
# curl -X POST "http://127.0.0.1:8000/chat_feedback?client_request_id=req-abc-123-xyz-789" \
# -H "Content-Type: application/json" \
# -H "X-User-ID: user-jane-doe-12345" \
# -d '{
# "is_correct": true,
# "comment": "The response was accurate and helpful"
# }'

This feedback collection approach allows you to:

  • Link feedback to specific interactions: Use the client request ID to find the exact trace and attach feedback
  • Store structured feedback: The log_feedback API creates proper Assessment objects that are visible in the MLflow UI
  • Analyze quality patterns: Query traces with their associated feedback to identify what types of interactions receive positive or negative ratings

You can later query traces with feedback using the MLflow UI or programmatically to analyze patterns and improve your application.

Querying traces with context

Use the contextual information to analyze production behavior:

Python
import mlflow
from mlflow.client import MlflowClient
import pandas as pd

client = MlflowClient()
experiment = client.get_experiment_by_name("/Shared/production-app")

# Query traces by user
user_traces = client.search_traces(
experiment_ids=[experiment.experiment_id],
filter_string="tags.`mlflow.trace.user` = 'user-jane-doe-12345'",
max_results=100
)

# Query traces by session
session_traces = client.search_traces(
experiment_ids=[experiment.experiment_id],
filter_string="tags.`mlflow.trace.session` = 'session-123'",
max_results=100
)

Next steps

Continue your journey with these recommended actions and tutorials.

Reference guides

Explore detailed documentation for concepts and features mentioned in this guide.