Labeling during development
As a developer building GenAI applications, you need a way to track your observations about the quality of your application's outputs. MLflow Tracing allows you to add feedback or expectations directly to traces during development, giving you a quick way to record quality issues, mark successful examples, or add notes for future reference.
Prerequisites
- Your application is instrumented with MLflow Tracing
- You have generated traces by running your application
Add labels to traces via the UI
MLflow makes it easy to add annotations (labels) directly to traces through the MLflow UI.
If you are using a Databricks Notebook, you can also perform these steps from the Trace UI that renders inline in the notebook.
- Navigate to the Traces tab in the MLflow Experiment UI
- Open an individual trace
- Within the trace UI, click on the specific span you want to label
- Selecting the root span attaches feedback to the entire trace
- Expand the Assessments tab at the far right
- Fill in the form to add your feedback
- Assessment Type
- Feedback: Subjective assessment of quality (ratings, comments)
- Expectation: The expected output or value (what should have been produced)
- Assessment Name
- A unique name for what the feedback is about
- Data Type
- Number
- Boolean
- String
- Value
- Your assessment
- Rationale
- Optional notes about the value
- Assessment Type
- Click Create to save your label
- When you return to the Traces tab, your label will appear as a new column
Add labels to traces via the SDK
You can programmatically add labels to traces using MLflow's SDK. This is useful for automated labeling based on your application logic or for batch processing of traces.
For a complete set of examples, see the logging assessments concept page.
import mlflow
@mlflow.trace
def my_app(input: str) -> str:
return input + "_output"
my_app(input="hello")
trace_id = mlflow.get_last_active_trace_id()
# Log a thumbs up/down rating
mlflow.log_feedback(
trace_id=trace_id,
name="quality_rating",
value=1, # 1 for thumbs up, 0 for thumbs down
rationale="The response was accurate and helpful",
source=mlflow.entities.assessment.AssessmentSource(
source_type=mlflow.entities.assessment.AssessmentSourceType.HUMAN,
source_id="bob@example.com",
),
)
# Log expected response text
mlflow.log_expectation(
trace_id=trace_id,
name="expected_response",
value="The capital of France is Paris.",
source=mlflow.entities.assessment.AssessmentSource(
source_type=mlflow.entities.assessment.AssessmentSourceType.HUMAN,
source_id="bob@example.com",
),
)
Next steps
Continue your journey with these recommended actions and tutorials.
- Collect domain expert feedback - Set up structured labeling sessions
- Build evaluation datasets - Use your labeled traces to create test datasets
- Collect end-user feedback - Capture feedback from deployed applications
Reference guides
Explore detailed documentation for concepts and features mentioned in this guide.
- Logging assessments - Deep dive into assessment types and usage
- Tracing data model - Understand how assessments attach to traces
- Labeling schemas - Learn about structured feedback collection