Collect user feedback
Collecting and logging user feedback is essential for understanding the real-world quality of your GenAI application. MLflow provides a structured way to capture feedback as assessments on traces, enabling you to track quality over time, identify areas for improvement, and build evaluation datasets from production data.
Prerequisites
Choose the appropriate installation method based on your environment:
- Production
- Development
For production deployments, install the mlflow-tracing
package:
pip install --upgrade mlflow-tracing
The mlflow-tracing
package is optimized for production use with minimal dependencies and better performance characteristics.
For development environments, install the full MLflow package with Databricks extras:
pip install --upgrade "mlflow[databricks]>=3.1"
The full mlflow[databricks]
package includes all features needed for local development and experimentation on Databricks.
The log_feedback
API is available in both packages, so you can collect user feedback regardless of which installation method you choose.
MLflow 3 is required for collecting user feedback. MLflow 2.x is not supported due to performance limitations and missing features essential for production use.
Why collect user feedback?
User feedback provides ground truth about your application's performance:
- Real-world quality signals - Understand how actual users perceive your application's outputs
- Continuous improvement - Identify patterns in negative feedback to guide development
- Training data creation - Use feedback to build high-quality evaluation datasets
- Quality monitoring - Track satisfaction metrics over time and across different user segments
- Model fine-tuning - Leverage feedback data to improve your underlying models
Types of feedback
MLflow supports various types of feedback through its assessment system:
Feedback Type | Description | Common Use Cases |
---|---|---|
Binary feedback | Simple thumbs up/down or correct/incorrect | Quick user satisfaction signals |
Numeric scores | Ratings on a scale (e.g., 1-5 stars) | Detailed quality assessment |
Categorical feedback | Multiple choice options | Classifying issues or response types |
Text feedback | Free-form comments | Detailed user explanations |
Understanding the Feedback data model
In MLflow, user feedback is captured using the Feedback entity, which is a type of Assessment that can be attached to traces or specific spans. The Feedback entity provides a structured way to store:
- Value: The actual feedback (boolean, numeric, text, or structured data)
- Source: Information about who or what provided the feedback (human user, LLM judge, or code)
- Rationale: Optional explanation for the feedback
- Metadata: Additional context like timestamps or custom attributes
Understanding this data model helps you design effective feedback collection systems that integrate seamlessly with MLflow's evaluation and monitoring capabilities. For detailed information about the Feedback entity schema and all available fields, see the Feedback section in the Tracing Data Model.
End user feedback collection
When implementing feedback collection in production, you need to link user feedback to specific traces. There are two approaches you can use:
- Using client request IDs - Generate your own unique IDs when processing requests and reference them later for feedback
- Using MLflow trace IDs - Use the trace ID automatically generated by MLflow
Understanding the feedback collection flow
Both approaches follow a similar pattern:
-
During the initial request: Your application either generates a unique client request ID or retrieves the MLflow-generated trace ID
-
After receiving the response: The user can provide feedback by referencing either ID Both approaches follow a similar pattern:
-
During the initial request: Your application either generates a unique client request ID or retrieves the MLflow-generated trace ID
-
After receiving the response: The user can provide feedback by referencing either ID
-
Feedback is logged: MLflow's
log_feedback
API creates an assessment attached to the original trace -
Analysis and monitoring: You can query and analyze feedback across all traces
Implementing feedback collection
- Approach 1: Using MLflow trace IDs
- Approach 2: Using client request IDs
The simplest approach is to use the trace ID that MLflow automatically generates for each trace. You can retrieve this ID during request processing and return it to the client:
Backend implementation
import mlflow
from fastapi import FastAPI, Query
from mlflow.client import MlflowClient
from mlflow.entities import AssessmentSource
from pydantic import BaseModel
from typing import Optional
app = FastAPI()
class ChatRequest(BaseModel):
message: str
class ChatResponse(BaseModel):
response: str
trace_id: str # Include the trace ID in the response
@app.post("/chat", response_model=ChatResponse)
def chat(request: ChatRequest):
"""
Process a chat request and return the trace ID for feedback collection.
"""
# Your GenAI application logic here
response = process_message(request.message) # Replace with your actual processing logic
# Get the current trace ID
trace_id = mlflow.get_current_active_span().trace_id
return ChatResponse(
response=response,
trace_id=trace_id
)
class FeedbackRequest(BaseModel):
is_correct: bool # True for thumbs up, False for thumbs down
comment: Optional[str] = None
@app.post("/feedback")
def submit_feedback(
trace_id: str = Query(..., description="The trace ID from the chat response"),
feedback: FeedbackRequest = ...,
user_id: Optional[str] = Query(None, description="User identifier")
):
"""
Collect user feedback using the MLflow trace ID.
"""
# Log the feedback directly using the trace ID
mlflow.log_feedback(
trace_id=trace_id,
name="user_feedback",
value=feedback.is_correct,
source=AssessmentSource(
source_type="HUMAN",
source_id=user_id
),
rationale=feedback.comment
)
return {
"status": "success",
"trace_id": trace_id,
}
Frontend implementation example
Below is an example of the front end implementation for a React-based application:
// React example for chat with feedback
import React, { useState } from 'react';
function ChatWithFeedback() {
const [message, setMessage] = useState('');
const [response, setResponse] = useState('');
const [traceId, setTraceId] = useState(null);
const [feedbackSubmitted, setFeedbackSubmitted] = useState(false);
const sendMessage = async () => {
try {
const res = await fetch('/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message }),
});
const data = await res.json();
setResponse(data.response);
setTraceId(data.trace_id);
setFeedbackSubmitted(false);
} catch (error) {
console.error('Chat error:', error);
}
};
const submitFeedback = async (isCorrect, comment = null) => {
if (!traceId || feedbackSubmitted) return;
try {
const params = new URLSearchParams({
trace_id: traceId,
...(userId && { user_id: userId }),
});
const res = await fetch(`/feedback?${params}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
is_correct: isCorrect,
comment: comment,
}),
});
if (res.ok) {
setFeedbackSubmitted(true);
// Optionally show success message
}
} catch (error) {
console.error('Feedback submission error:', error);
}
};
return (
<div>
<input value={message} onChange={(e) => setMessage(e.target.value)} placeholder="Ask a question..." />
<button onClick={sendMessage}>Send</button>
{response && (
<div>
<p>{response}</p>
<div className="feedback-buttons">
<button onClick={() => submitFeedback(true)} disabled={feedbackSubmitted}>
👍
</button>
<button onClick={() => submitFeedback(false)} disabled={feedbackSubmitted}>
👎
</button>
</div>
{feedbackSubmitted && <span>Thanks for your feedback!</span>}
</div>
)}
</div>
);
}
For more control over request tracking, you can use your own unique client request IDs. This approach is useful when you need to maintain your own request tracking system or integrate with existing infrastructure:
This approach requires you to implement request tracking where each trace has a client_request_id
attribute. For more information on how to attach client request IDs to your traces during the initial request, see Add context to traces.
Backend implementation
import mlflow
from fastapi import FastAPI, Query, Request
from mlflow.client import MlflowClient
from mlflow.entities import AssessmentSource
from pydantic import BaseModel
from typing import Optional
import uuid
app = FastAPI()
class ChatRequest(BaseModel):
message: str
class ChatResponse(BaseModel):
response: str
client_request_id: str # Include the client request ID in the response
@app.post("/chat", response_model=ChatResponse)
def chat(request: ChatRequest):
"""
Process a chat request and set a client request ID for later feedback collection.
"""
# Sample: Generate a unique client request ID
# Normally, this ID would be your app's backend existing ID for this interaction
client_request_id = f"req-{uuid.uuid4().hex[:8]}"
# Attach the client request ID to the current trace
mlflow.update_current_trace(client_request_id=client_request_id)
# Your GenAI application logic here
response = process_message(request.message) # Replace with your actual processing logic
return ChatResponse(
response=response,
client_request_id=client_request_id
)
class FeedbackRequest(BaseModel):
is_correct: bool # True for thumbs up, False for thumbs down
comment: Optional[str] = None
@app.post("/feedback")
def submit_feedback(
request: Request,
client_request_id: str = Query(..., description="The request ID from the original interaction"),
feedback: FeedbackRequest = ...
):
"""
Collect user feedback for a specific interaction.
This endpoint:
1. Finds the trace using the client request ID
2. Logs the feedback as an MLflow assessment
3. Adds tags for easier querying and filtering
"""
client = MlflowClient()
# Find the trace using the client request ID
experiment = client.get_experiment_by_name("/Shared/production-app")
traces = client.search_traces(
experiment_ids=[experiment.experiment_id],
filter_string=f"attributes.client_request_id = '{client_request_id}'",
max_results=1
)
if not traces:
return {"status": "error", "message": "Unexpected error: request not found"}, 500
# Log the feedback as an assessment
# Assessments are the structured way to attach feedback to traces
mlflow.log_feedback(
trace_id=traces[0].info.trace_id,
name="user_feedback",
value=feedback.is_correct,
source=AssessmentSource(
source_type="HUMAN", # Indicates this is human feedback
source_id=request.headers.get("X-User-ID") # Link feedback to the user who provided it
),
rationale=feedback.comment # Optional explanation from the user
)
return {
"status": "success",
"trace_id": traces[0].info.trace_id,
}
Frontend implementation example
Below is an example of the front end implementation for a React-based application. When using client request IDs, your frontend needs to store and manage these IDs:
// React example with session-based request tracking
import React, { useState, useEffect } from 'react';
function ChatWithRequestTracking() {
const [message, setMessage] = useState('');
const [conversations, setConversations] = useState([]);
const [sessionId] = useState(() => `session-${Date.now()}`);
const sendMessage = async () => {
try {
const res = await fetch('/chat', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-Session-ID': sessionId,
},
body: JSON.stringify({ message }),
});
const data = await res.json();
// Store conversation with request ID
setConversations((prev) => [
...prev,
{
id: data.client_request_id,
message: message,
response: data.response,
timestamp: new Date(),
feedbackSubmitted: false,
},
]);
setMessage('');
} catch (error) {
console.error('Chat error:', error);
}
};
const submitFeedback = async (requestId, isCorrect, comment = null) => {
try {
const params = new URLSearchParams({
client_request_id: requestId,
});
const res = await fetch(`/feedback?${params}`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'X-User-ID': getUserId(), // Your user identification method
},
body: JSON.stringify({
is_correct: isCorrect,
comment: comment,
}),
});
if (res.ok) {
// Mark feedback as submitted
setConversations((prev) =>
prev.map((conv) => (conv.id === requestId ? { ...conv, feedbackSubmitted: true } : conv)),
);
}
} catch (error) {
console.error('Feedback submission error:', error);
}
};
return (
<div>
<div className="chat-history">
{conversations.map((conv) => (
<div key={conv.id} className="conversation">
<div className="user-message">{conv.message}</div>
<div className="bot-response">{conv.response}</div>
<div className="feedback-section">
<button onClick={() => submitFeedback(conv.id, true)} disabled={conv.feedbackSubmitted}>
👍
</button>
<button onClick={() => submitFeedback(conv.id, false)} disabled={conv.feedbackSubmitted}>
👎
</button>
{conv.feedbackSubmitted && <span>✓ Feedback received</span>}
</div>
</div>
))}
</div>
<div className="chat-input">
<input
value={message}
onChange={(e) => setMessage(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
placeholder="Type your message..."
/>
<button onClick={sendMessage}>Send</button>
</div>
</div>
);
}
Key implementation details
AssessmentSource: The AssessmentSource
object identifies who or what provided the feedback:
source_type
: Can be "HUMAN" for user feedback or "LLM_JUDGE" for automated evaluationsource_id
: Identifies the specific user or system providing feedback
Feedback storage: Feedback is stored as assessments on the trace, which means:
- It's permanently associated with the specific interaction
- It can be queried alongside the trace data
- It's visible in the MLflow UI when viewing the trace
Handling different feedback types
You can extend either approach to support more complex feedback. Here's an example using trace IDs:
from mlflow.entities import AssessmentSource
@app.post("/detailed-feedback")
def submit_detailed_feedback(
trace_id: str,
accuracy: int = Query(..., ge=1, le=5, description="Accuracy rating from 1-5"),
helpfulness: int = Query(..., ge=1, le=5, description="Helpfulness rating from 1-5"),
relevance: int = Query(..., ge=1, le=5, description="Relevance rating from 1-5"),
user_id: str = Query(..., description="User identifier"),
comment: Optional[str] = None
):
"""
Collect multi-dimensional feedback with separate ratings for different aspects.
Each aspect is logged as a separate assessment for granular analysis.
"""
# Log each dimension as a separate assessment
dimensions = {
"accuracy": accuracy,
"helpfulness": helpfulness,
"relevance": relevance
}
for dimension, score in dimensions.items():
mlflow.log_feedback(
trace_id=trace_id,
name=f"user_{dimension}",
value=score / 5.0, # Normalize to 0-1 scale
source=AssessmentSource(
source_type="HUMAN",
source_id=user_id
),
rationale=comment if dimension == "accuracy" else None
)
return {
"status": "success",
"trace_id": trace_id,
"feedback_recorded": dimensions
}
Handling feedback with streaming responses
When using streaming responses (Server-Sent Events or WebSockets), the trace ID isn't available until the stream completes. This presents a unique challenge for feedback collection that requires a different approach.
Why streaming is different
In traditional request-response patterns, you receive the complete response and trace ID together. With streaming:
- Tokens arrive incrementally: The response is built up over time as tokens stream from the LLM
- Trace completion is deferred: The trace ID is only generated after the entire stream finishes
- Feedback UI must wait: Users can't provide feedback until they have both the complete response and the trace ID
Backend implementation with SSE
Here's how to implement streaming with trace ID delivery at the end of the stream:
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
import mlflow
import json
import asyncio
from typing import AsyncGenerator
@app.post("/chat/stream")
async def chat_stream(request: ChatRequest):
"""
Stream chat responses with trace ID sent at completion.
"""
async def generate() -> AsyncGenerator[str, None]:
try:
# Start MLflow trace
with mlflow.start_span(name="streaming_chat") as span:
# Update trace with request metadata
mlflow.update_current_trace(
request_message=request.message,
stream_start_time=datetime.now().isoformat()
)
# Stream tokens from your LLM
full_response = ""
async for token in your_llm_stream_function(request.message):
full_response += token
yield f"data: {json.dumps({'type': 'token', 'content': token})}\n\n"
await asyncio.sleep(0.01) # Prevent overwhelming the client
# Log the complete response to the trace
span.set_attribute("response", full_response)
span.set_attribute("token_count", len(full_response.split()))
# Get trace ID after completion
trace_id = span.trace_id
# Send trace ID as final event
yield f"data: {json.dumps({'type': 'done', 'trace_id': trace_id})}\n\n"
except Exception as e:
# Log error to trace if available
if mlflow.get_current_active_span():
mlflow.update_current_trace(error=str(e))
yield f"data: {json.dumps({'type': 'error', 'error': str(e)})}\n\n"
return StreamingResponse(
generate(),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"X-Accel-Buffering": "no", # Disable proxy buffering
}
)
Frontend implementation for streaming
Handle the streaming events and enable feedback only after receiving the trace ID:
// React hook for streaming chat with feedback
import React, { useState, useCallback } from 'react';
function useStreamingChat() {
const [isStreaming, setIsStreaming] = useState(false);
const [streamingContent, setStreamingContent] = useState('');
const [traceId, setTraceId] = useState(null);
const [error, setError] = useState(null);
const sendStreamingMessage = useCallback(async (message) => {
// Reset state
setIsStreaming(true);
setStreamingContent('');
setTraceId(null);
setError(null);
try {
const response = await fetch('/chat/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message }),
});
if (!response.ok) {
throw new Error(`HTTP error! status: ${response.status}`);
}
const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = '';
while (true) {
const { done, value } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const lines = buffer.split('\n');
// Keep the last incomplete line in the buffer
buffer = lines.pop() || '';
for (const line of lines) {
if (line.startsWith('data: ')) {
try {
const data = JSON.parse(line.slice(6));
switch (data.type) {
case 'token':
setStreamingContent((prev) => prev + data.content);
break;
case 'done':
setTraceId(data.trace_id);
setIsStreaming(false);
break;
case 'error':
setError(data.error);
setIsStreaming(false);
break;
}
} catch (e) {
console.error('Failed to parse SSE data:', e);
}
}
}
}
} catch (error) {
setError(error.message);
setIsStreaming(false);
}
}, []);
return {
sendStreamingMessage,
streamingContent,
isStreaming,
traceId,
error,
};
}
// Component using the streaming hook
function StreamingChatWithFeedback() {
const [message, setMessage] = useState('');
const [feedbackSubmitted, setFeedbackSubmitted] = useState(false);
const { sendStreamingMessage, streamingContent, isStreaming, traceId, error } = useStreamingChat();
const handleSend = () => {
if (message.trim()) {
setFeedbackSubmitted(false);
sendStreamingMessage(message);
setMessage('');
}
};
const submitFeedback = async (isPositive) => {
if (!traceId || feedbackSubmitted) return;
try {
const response = await fetch(`/feedback?trace_id=${traceId}`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
is_correct: isPositive,
comment: null,
}),
});
if (response.ok) {
setFeedbackSubmitted(true);
}
} catch (error) {
console.error('Feedback submission failed:', error);
}
};
return (
<div className="streaming-chat">
<div className="chat-messages">
{streamingContent && (
<div className="message assistant">
{streamingContent}
{isStreaming && <span className="typing-indicator">...</span>}
</div>
)}
{error && <div className="error-message">Error: {error}</div>}
</div>
{/* Feedback buttons - only enabled when trace ID is available */}
{streamingContent && !isStreaming && traceId && (
<div className="feedback-section">
<span>Was this response helpful?</span>
<button onClick={() => submitFeedback(true)} disabled={feedbackSubmitted} className="feedback-btn positive">
👍 Yes
</button>
<button onClick={() => submitFeedback(false)} disabled={feedbackSubmitted} className="feedback-btn negative">
👎 No
</button>
{feedbackSubmitted && <span className="feedback-thanks">Thank you!</span>}
</div>
)}
<div className="chat-input-section">
<input
type="text"
value={message}
onChange={(e) => setMessage(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && !isStreaming && handleSend()}
placeholder="Type your message..."
disabled={isStreaming}
/>
<button onClick={handleSend} disabled={isStreaming || !message.trim()}>
{isStreaming ? 'Streaming...' : 'Send'}
</button>
</div>
</div>
);
}
Key considerations for streaming
When implementing feedback collection with streaming responses, keep these points in mind:
-
Trace ID timing: The trace ID is only available after the streaming completes. Design your UI to handle this gracefully by disabling feedback controls until the trace ID is received.
-
Event structure: Use a consistent event format with a
type
field to distinguish between content tokens, completion events, and errors. This makes parsing and handling events more reliable. -
State management: Track both the streaming content and trace ID separately. Reset all state at the start of each new interaction to prevent stale data issues.
-
Error handling: Include error events in the stream to gracefully handle failures. Ensure errors are logged to the trace when possible for debugging.
-
Buffer management:
- Use
X-Accel-Buffering: no
header to disable proxy buffering - Implement proper line buffering in the frontend to handle partial SSE messages
- Consider implementing reconnection logic for network interruptions
- Use
-
Performance optimization:
- Add small delays between tokens (
asyncio.sleep(0.01)
) to prevent overwhelming clients - Batch multiple tokens if they arrive too quickly
- Consider implementing backpressure mechanisms for slow clients
- Add small delays between tokens (
Analyzing feedback data
Once you've collected feedback, you can analyze it to gain insights about your application's quality and user satisfaction.
Viewing feedback in the Trace UI
Getting traces with feedback via the SDK
Viewing feedback in the Trace UI
Getting traces with feedback via the SDK
First, retrieve traces from a specific time window:
from mlflow.client import MlflowClient
from datetime import datetime, timedelta
def get_recent_traces(experiment_name: str, hours: int = 24):
"""Get traces from the last N hours."""
client = MlflowClient()
# Calculate cutoff time
cutoff_time = datetime.now() - timedelta(hours=hours)
cutoff_timestamp_ms = int(cutoff_time.timestamp() * 1000)
# Query traces
traces = client.search_traces(
experiment_names=[experiment_name],
filter_string=f"trace.timestamp_ms > {cutoff_timestamp_ms}"
)
return traces
Analyzing feedback patterns via the SDK
Extract and analyze feedback from the traces:
def analyze_user_feedback(traces):
"""Analyze feedback patterns from traces."""
client = MlflowClient()
# Initialize counters
total_traces = len(traces)
traces_with_feedback = 0
positive_count = 0
negative_count = 0
# Process each trace
for trace in traces:
# Get full trace details including assessments
trace_detail = client.get_trace(trace.info.trace_id)
if trace_detail.data.assessments:
traces_with_feedback += 1
# Count positive/negative feedback
for assessment in trace_detail.data.assessments:
if assessment.name == "user_feedback":
if assessment.value:
positive_count += 1
else:
negative_count += 1
# Calculate metrics
if traces_with_feedback > 0:
feedback_rate = (traces_with_feedback / total_traces) * 100
positive_rate = (positive_count / traces_with_feedback) * 100
else:
feedback_rate = 0
positive_rate = 0
return {
"total_traces": total_traces,
"traces_with_feedback": traces_with_feedback,
"feedback_rate": feedback_rate,
"positive_rate": positive_rate,
"positive_count": positive_count,
"negative_count": negative_count
}
# Example usage
traces = get_recent_traces("/Shared/production-genai-app", hours=24)
results = analyze_user_feedback(traces)
print(f"Feedback rate: {results['feedback_rate']:.1f}%")
print(f"Positive feedback: {results['positive_rate']:.1f}%")
print(f"Total feedback: {results['traces_with_feedback']} out of {results['total_traces']} traces")
Analyzing multi-dimensional feedback
For more detailed feedback with ratings:
def analyze_ratings(traces):
"""Analyze rating-based feedback."""
client = MlflowClient()
ratings_by_dimension = {}
for trace in traces:
trace_detail = client.get_trace(trace.info.trace_id)
if trace_detail.data.assessments:
for assessment in trace_detail.data.assessments:
# Look for rating assessments
if assessment.name.startswith("user_") and assessment.name != "user_feedback":
dimension = assessment.name.replace("user_", "")
if dimension not in ratings_by_dimension:
ratings_by_dimension[dimension] = []
ratings_by_dimension[dimension].append(assessment.value)
# Calculate averages
average_ratings = {}
for dimension, scores in ratings_by_dimension.items():
if scores:
average_ratings[dimension] = sum(scores) / len(scores)
return average_ratings
# Example usage
ratings = analyze_ratings(traces)
for dimension, avg_score in ratings.items():
print(f"{dimension}: {avg_score:.2f}/1.0")
Production considerations
For production deployments, see our guide on production observability with tracing which covers:
- Implementing feedback collection endpoints
- Linking feedback to traces using client request IDs
- Setting up real-time quality monitoring
- Best practices for high-volume feedback processing
Next steps
Continue your journey with these recommended actions and tutorials.
- Build evaluation datasets - Use collected feedback to create test datasets
- Use traces to improve quality - Analyze feedback patterns to identify improvements
- Set up production monitoring - Monitor quality metrics based on feedback
Reference guides
Explore detailed documentation for concepts and features mentioned in this guide.
- Logging assessments - Understand how feedback is stored as assessments
- Tracing data model - Learn about assessments and trace structure
- Query traces via SDK - Advanced techniques for analyzing feedback