Skip to main content

Collect user feedback

Collecting and logging user feedback is essential for understanding the real-world quality of your GenAI application. MLflow provides a structured way to capture feedback as assessments on traces, enabling you to track quality over time, identify areas for improvement, and build evaluation datasets from production data.

Prerequisites

Choose the appropriate installation method based on your environment:

For production deployments, install the mlflow-tracing package:

Bash
pip install --upgrade mlflow-tracing

The mlflow-tracing package is optimized for production use with minimal dependencies and better performance characteristics.

The log_feedback API is available in both packages, so you can collect user feedback regardless of which installation method you choose.

note

MLflow 3 is required for collecting user feedback. MLflow 2.x is not supported due to performance limitations and missing features essential for production use.

Why collect user feedback?

User feedback provides ground truth about your application's performance:

  1. Real-world quality signals - Understand how actual users perceive your application's outputs
  2. Continuous improvement - Identify patterns in negative feedback to guide development
  3. Training data creation - Use feedback to build high-quality evaluation datasets
  4. Quality monitoring - Track satisfaction metrics over time and across different user segments
  5. Model fine-tuning - Leverage feedback data to improve your underlying models

Types of feedback

MLflow supports various types of feedback through its assessment system:

Feedback Type

Description

Common Use Cases

Binary feedback

Simple thumbs up/down or correct/incorrect

Quick user satisfaction signals

Numeric scores

Ratings on a scale (e.g., 1-5 stars)

Detailed quality assessment

Categorical feedback

Multiple choice options

Classifying issues or response types

Text feedback

Free-form comments

Detailed user explanations

Understanding the Feedback data model

In MLflow, user feedback is captured using the Feedback entity, which is a type of Assessment that can be attached to traces or specific spans. The Feedback entity provides a structured way to store:

  • Value: The actual feedback (boolean, numeric, text, or structured data)
  • Source: Information about who or what provided the feedback (human user, LLM judge, or code)
  • Rationale: Optional explanation for the feedback
  • Metadata: Additional context like timestamps or custom attributes

Understanding this data model helps you design effective feedback collection systems that integrate seamlessly with MLflow's evaluation and monitoring capabilities. For detailed information about the Feedback entity schema and all available fields, see the Feedback section in the Tracing Data Model.

End user feedback collection

When implementing feedback collection in production, you need to link user feedback to specific traces. There are two approaches you can use:

  1. Using client request IDs - Generate your own unique IDs when processing requests and reference them later for feedback
  2. Using MLflow trace IDs - Use the trace ID automatically generated by MLflow

Understanding the feedback collection flow

Both approaches follow a similar pattern:

  1. During the initial request: Your application either generates a unique client request ID or retrieves the MLflow-generated trace ID

  2. After receiving the response: The user can provide feedback by referencing either ID Both approaches follow a similar pattern:

  3. During the initial request: Your application either generates a unique client request ID or retrieves the MLflow-generated trace ID

  4. After receiving the response: The user can provide feedback by referencing either ID

  5. Feedback is logged: MLflow's log_feedback API creates an assessment attached to the original trace

  6. Analysis and monitoring: You can query and analyze feedback across all traces

Implementing feedback collection

The simplest approach is to use the trace ID that MLflow automatically generates for each trace. You can retrieve this ID during request processing and return it to the client:

Backend implementation

Python
import mlflow
from fastapi import FastAPI, Query
from mlflow.client import MlflowClient
from mlflow.entities import AssessmentSource
from pydantic import BaseModel
from typing import Optional

app = FastAPI()

class ChatRequest(BaseModel):
message: str

class ChatResponse(BaseModel):
response: str
trace_id: str # Include the trace ID in the response

@app.post("/chat", response_model=ChatResponse)
def chat(request: ChatRequest):
"""
Process a chat request and return the trace ID for feedback collection.
"""
# Your GenAI application logic here
response = process_message(request.message) # Replace with your actual processing logic

# Get the current trace ID
trace_id = mlflow.get_current_active_span().trace_id

return ChatResponse(
response=response,
trace_id=trace_id
)

class FeedbackRequest(BaseModel):
is_correct: bool # True for thumbs up, False for thumbs down
comment: Optional[str] = None

@app.post("/feedback")
def submit_feedback(
trace_id: str = Query(..., description="The trace ID from the chat response"),
feedback: FeedbackRequest = ...,
user_id: Optional[str] = Query(None, description="User identifier")
):
"""
Collect user feedback using the MLflow trace ID.
"""
# Log the feedback directly using the trace ID
mlflow.log_feedback(
trace_id=trace_id,
name="user_feedback",
value=feedback.is_correct,
source=AssessmentSource(
source_type="HUMAN",
source_id=user_id
),
rationale=feedback.comment
)

return {
"status": "success",
"trace_id": trace_id,
}

Frontend implementation example

Below is an example of the front end implementation for a React-based application:

JavaScript
// React example for chat with feedback
import React, { useState } from 'react';

function ChatWithFeedback() {
const [message, setMessage] = useState('');
const [response, setResponse] = useState('');
const [traceId, setTraceId] = useState(null);
const [feedbackSubmitted, setFeedbackSubmitted] = useState(false);

const sendMessage = async () => {
try {
const res = await fetch('/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message }),
});

const data = await res.json();
setResponse(data.response);
setTraceId(data.trace_id);
setFeedbackSubmitted(false);
} catch (error) {
console.error('Chat error:', error);
}
};

const submitFeedback = async (isCorrect, comment = null) => {
if (!traceId || feedbackSubmitted) return;

try {
const params = new URLSearchParams({
trace_id: traceId,
...(userId && { user_id: userId }),
});

const res = await fetch(`/feedback?${params}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
is_correct: isCorrect,
comment: comment,
}),
});

if (res.ok) {
setFeedbackSubmitted(true);
// Optionally show success message
}
} catch (error) {
console.error('Feedback submission error:', error);
}
};

return (
<div>
<input value={message} onChange={(e) => setMessage(e.target.value)} placeholder="Ask a question..." />
<button onClick={sendMessage}>Send</button>

{response && (
<div>
<p>{response}</p>
<div className="feedback-buttons">
<button onClick={() => submitFeedback(true)} disabled={feedbackSubmitted}>
👍
</button>
<button onClick={() => submitFeedback(false)} disabled={feedbackSubmitted}>
👎
</button>
</div>
{feedbackSubmitted && <span>Thanks for your feedback!</span>}
</div>
)}
</div>
);
}

Key implementation details

AssessmentSource: The AssessmentSource object identifies who or what provided the feedback:

  • source_type: Can be "HUMAN" for user feedback or "LLM_JUDGE" for automated evaluation
  • source_id: Identifies the specific user or system providing feedback

Feedback storage: Feedback is stored as assessments on the trace, which means:

  • It's permanently associated with the specific interaction
  • It can be queried alongside the trace data
  • It's visible in the MLflow UI when viewing the trace

Handling different feedback types

You can extend either approach to support more complex feedback. Here's an example using trace IDs:

Python
from mlflow.entities import AssessmentSource

@app.post("/detailed-feedback")
def submit_detailed_feedback(
trace_id: str,
accuracy: int = Query(..., ge=1, le=5, description="Accuracy rating from 1-5"),
helpfulness: int = Query(..., ge=1, le=5, description="Helpfulness rating from 1-5"),
relevance: int = Query(..., ge=1, le=5, description="Relevance rating from 1-5"),
user_id: str = Query(..., description="User identifier"),
comment: Optional[str] = None
):
"""
Collect multi-dimensional feedback with separate ratings for different aspects.
Each aspect is logged as a separate assessment for granular analysis.
"""
# Log each dimension as a separate assessment
dimensions = {
"accuracy": accuracy,
"helpfulness": helpfulness,
"relevance": relevance
}

for dimension, score in dimensions.items():
mlflow.log_feedback(
trace_id=trace_id,
name=f"user_{dimension}",
value=score / 5.0, # Normalize to 0-1 scale
source=AssessmentSource(
source_type="HUMAN",
source_id=user_id
),
rationale=comment if dimension == "accuracy" else None
)

return {
"status": "success",
"trace_id": trace_id,
"feedback_recorded": dimensions
}

Handling feedback with streaming responses

When using streaming responses (Server-Sent Events or WebSockets), the trace ID isn't available until the stream completes. This presents a unique challenge for feedback collection that requires a different approach.

Why streaming is different

In traditional request-response patterns, you receive the complete response and trace ID together. With streaming:

  1. Tokens arrive incrementally: The response is built up over time as tokens stream from the LLM
  2. Trace completion is deferred: The trace ID is only generated after the entire stream finishes
  3. Feedback UI must wait: Users can't provide feedback until they have both the complete response and the trace ID

Backend implementation with SSE

Here's how to implement streaming with trace ID delivery at the end of the stream:

Python
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import mlflow
import json
import asyncio
from typing import AsyncGenerator

@app.post("/chat/stream")
async def chat_stream(request: ChatRequest):
"""
Stream chat responses with trace ID sent at completion.
"""
async def generate() -> AsyncGenerator[str, None]:
try:
# Start MLflow trace
with mlflow.start_span(name="streaming_chat") as span:
# Update trace with request metadata
mlflow.update_current_trace(
request_message=request.message,
stream_start_time=datetime.now().isoformat()
)

# Stream tokens from your LLM
full_response = ""
async for token in your_llm_stream_function(request.message):
full_response += token
yield f"data: {json.dumps({'type': 'token', 'content': token})}\n\n"
await asyncio.sleep(0.01) # Prevent overwhelming the client

# Log the complete response to the trace
span.set_attribute("response", full_response)
span.set_attribute("token_count", len(full_response.split()))

# Get trace ID after completion
trace_id = span.trace_id

# Send trace ID as final event
yield f"data: {json.dumps({'type': 'done', 'trace_id': trace_id})}\n\n"

except Exception as e:
# Log error to trace if available
if mlflow.get_current_active_span():
mlflow.update_current_trace(error=str(e))

yield f"data: {json.dumps({'type': 'error', 'error': str(e)})}\n\n"

return StreamingResponse(
generate(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"X-Accel-Buffering": "no", # Disable proxy buffering
}
)

Frontend implementation for streaming

Handle the streaming events and enable feedback only after receiving the trace ID:

JavaScript
// React hook for streaming chat with feedback
import React, { useState, useCallback } from 'react';

function useStreamingChat() {
const [isStreaming, setIsStreaming] = useState(false);
const [streamingContent, setStreamingContent] = useState('');
const [traceId, setTraceId] = useState(null);
const [error, setError] = useState(null);

const sendStreamingMessage = useCallback(async (message) => {
// Reset state
setIsStreaming(true);
setStreamingContent('');
setTraceId(null);
setError(null);

try {
const response = await fetch('/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message }),
});

if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';

while (true) {
const { done, value } = await reader.read();
if (done) break;

buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');

// Keep the last incomplete line in the buffer
buffer = lines.pop() || '';

for (const line of lines) {
if (line.startsWith('data: ')) {
try {
const data = JSON.parse(line.slice(6));

switch (data.type) {
case 'token':
setStreamingContent((prev) => prev + data.content);
break;
case 'done':
setTraceId(data.trace_id);
setIsStreaming(false);
break;
case 'error':
setError(data.error);
setIsStreaming(false);
break;
}
} catch (e) {
console.error('Failed to parse SSE data:', e);
}
}
}
}
} catch (error) {
setError(error.message);
setIsStreaming(false);
}
}, []);

return {
sendStreamingMessage,
streamingContent,
isStreaming,
traceId,
error,
};
}

// Component using the streaming hook
function StreamingChatWithFeedback() {
const [message, setMessage] = useState('');
const [feedbackSubmitted, setFeedbackSubmitted] = useState(false);
const { sendStreamingMessage, streamingContent, isStreaming, traceId, error } = useStreamingChat();

const handleSend = () => {
if (message.trim()) {
setFeedbackSubmitted(false);
sendStreamingMessage(message);
setMessage('');
}
};

const submitFeedback = async (isPositive) => {
if (!traceId || feedbackSubmitted) return;

try {
const response = await fetch(`/feedback?trace_id=${traceId}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
is_correct: isPositive,
comment: null,
}),
});

if (response.ok) {
setFeedbackSubmitted(true);
}
} catch (error) {
console.error('Feedback submission failed:', error);
}
};

return (
<div className="streaming-chat">
<div className="chat-messages">
{streamingContent && (
<div className="message assistant">
{streamingContent}
{isStreaming && <span className="typing-indicator">...</span>}
</div>
)}
{error && <div className="error-message">Error: {error}</div>}
</div>

{/* Feedback buttons - only enabled when trace ID is available */}
{streamingContent && !isStreaming && traceId && (
<div className="feedback-section">
<span>Was this response helpful?</span>
<button onClick={() => submitFeedback(true)} disabled={feedbackSubmitted} className="feedback-btn positive">
👍 Yes
</button>
<button onClick={() => submitFeedback(false)} disabled={feedbackSubmitted} className="feedback-btn negative">
👎 No
</button>
{feedbackSubmitted && <span className="feedback-thanks">Thank you!</span>}
</div>
)}

<div className="chat-input-section">
<input
type="text"
value={message}
onChange={(e) => setMessage(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && !isStreaming && handleSend()}
placeholder="Type your message..."
disabled={isStreaming}
/>
<button onClick={handleSend} disabled={isStreaming || !message.trim()}>
{isStreaming ? 'Streaming...' : 'Send'}
</button>
</div>
</div>
);
}

Key considerations for streaming

When implementing feedback collection with streaming responses, keep these points in mind:

  1. Trace ID timing: The trace ID is only available after the streaming completes. Design your UI to handle this gracefully by disabling feedback controls until the trace ID is received.

  2. Event structure: Use a consistent event format with a type field to distinguish between content tokens, completion events, and errors. This makes parsing and handling events more reliable.

  3. State management: Track both the streaming content and trace ID separately. Reset all state at the start of each new interaction to prevent stale data issues.

  4. Error handling: Include error events in the stream to gracefully handle failures. Ensure errors are logged to the trace when possible for debugging.

  5. Buffer management:

    • Use X-Accel-Buffering: no header to disable proxy buffering
    • Implement proper line buffering in the frontend to handle partial SSE messages
    • Consider implementing reconnection logic for network interruptions
  6. Performance optimization:

    • Add small delays between tokens (asyncio.sleep(0.01)) to prevent overwhelming clients
    • Batch multiple tokens if they arrive too quickly
    • Consider implementing backpressure mechanisms for slow clients

Analyzing feedback data

Once you've collected feedback, you can analyze it to gain insights about your application's quality and user satisfaction.

Viewing feedback in the Trace UI

Getting traces with feedback via the SDK

Viewing feedback in the Trace UI

trace feedback

Getting traces with feedback via the SDK

First, retrieve traces from a specific time window:

Python
from mlflow.client import MlflowClient
from datetime import datetime, timedelta

def get_recent_traces(experiment_name: str, hours: int = 24):
"""Get traces from the last N hours."""
client = MlflowClient()

# Calculate cutoff time
cutoff_time = datetime.now() - timedelta(hours=hours)
cutoff_timestamp_ms = int(cutoff_time.timestamp() * 1000)

# Query traces
traces = client.search_traces(
experiment_names=[experiment_name],
filter_string=f"trace.timestamp_ms > {cutoff_timestamp_ms}"
)

return traces

Analyzing feedback patterns via the SDK

Extract and analyze feedback from the traces:

Python
def analyze_user_feedback(traces):
"""Analyze feedback patterns from traces."""

client = MlflowClient()

# Initialize counters
total_traces = len(traces)
traces_with_feedback = 0
positive_count = 0
negative_count = 0

# Process each trace
for trace in traces:
# Get full trace details including assessments
trace_detail = client.get_trace(trace.info.trace_id)

if trace_detail.data.assessments:
traces_with_feedback += 1

# Count positive/negative feedback
for assessment in trace_detail.data.assessments:
if assessment.name == "user_feedback":
if assessment.value:
positive_count += 1
else:
negative_count += 1

# Calculate metrics
if traces_with_feedback > 0:
feedback_rate = (traces_with_feedback / total_traces) * 100
positive_rate = (positive_count / traces_with_feedback) * 100
else:
feedback_rate = 0
positive_rate = 0

return {
"total_traces": total_traces,
"traces_with_feedback": traces_with_feedback,
"feedback_rate": feedback_rate,
"positive_rate": positive_rate,
"positive_count": positive_count,
"negative_count": negative_count
}

# Example usage
traces = get_recent_traces("/Shared/production-genai-app", hours=24)
results = analyze_user_feedback(traces)

print(f"Feedback rate: {results['feedback_rate']:.1f}%")
print(f"Positive feedback: {results['positive_rate']:.1f}%")
print(f"Total feedback: {results['traces_with_feedback']} out of {results['total_traces']} traces")

Analyzing multi-dimensional feedback

For more detailed feedback with ratings:

Python
def analyze_ratings(traces):
"""Analyze rating-based feedback."""

client = MlflowClient()
ratings_by_dimension = {}

for trace in traces:
trace_detail = client.get_trace(trace.info.trace_id)

if trace_detail.data.assessments:
for assessment in trace_detail.data.assessments:
# Look for rating assessments
if assessment.name.startswith("user_") and assessment.name != "user_feedback":
dimension = assessment.name.replace("user_", "")

if dimension not in ratings_by_dimension:
ratings_by_dimension[dimension] = []

ratings_by_dimension[dimension].append(assessment.value)

# Calculate averages
average_ratings = {}
for dimension, scores in ratings_by_dimension.items():
if scores:
average_ratings[dimension] = sum(scores) / len(scores)

return average_ratings

# Example usage
ratings = analyze_ratings(traces)
for dimension, avg_score in ratings.items():
print(f"{dimension}: {avg_score:.2f}/1.0")

Production considerations

For production deployments, see our guide on production observability with tracing which covers:

  • Implementing feedback collection endpoints
  • Linking feedback to traces using client request IDs
  • Setting up real-time quality monitoring
  • Best practices for high-volume feedback processing

Next steps

Continue your journey with these recommended actions and tutorials.

Reference guides

Explore detailed documentation for concepts and features mentioned in this guide.