Labeling during development

As a developer building GenAI applications, you need a way to track your observations about the quality of your application's outputs. MLflow Tracing allows you to add feedback or expectations directly to traces during development, giving you a quick way to record quality issues, mark successful examples, or add notes for future reference.

Prerequisites

Your application is instrumented with MLflow Tracing
You have generated traces by running your application

Add labels to traces via the UI

MLflow makes it easy to add annotations (labels) directly to traces through the MLflow UI.

note

If you are using a Databricks Notebook, you can also perform these steps from the Trace UI that renders inline in the notebook.

human feedback

Navigate to the Traces tab in the MLflow Experiment UI
Open an individual trace
Within the trace UI, click on the specific span you want to label
- Selecting the root span attaches feedback to the entire trace
Expand the Assessments tab at the far right
Fill in the form to add your feedback
- Assessment Type
  - Feedback: Subjective assessment of quality (ratings, comments)
  - Expectation: The expected output or value (what should have been produced)
- Assessment Name
  - A unique name for what the feedback is about
- Data Type
  - Number
  - Boolean
  - String
- Value
  - Your assessment
- Rationale
  - Optional notes about the value
Click Create to save your label
When you return to the Traces tab, your label will appear as a new column

Add labels to traces via the SDK

You can programmatically add labels to traces using MLflow's SDK. This is useful for automated labeling based on your application logic or for batch processing of traces.

For a complete set of examples, see the logging assessments concept page.

Python

import mlflow
@mlflow.trace
def my_app(input: str) -> str:
    return input + "_output"

my_app(input="hello")

trace_id = mlflow.get_last_active_trace_id()


# Log a thumbs up/down rating
mlflow.log_feedback(
    trace_id=trace_id,
    name="quality_rating",
    value=1,  # 1 for thumbs up, 0 for thumbs down
    rationale="The response was accurate and helpful",
    source=mlflow.entities.assessment.AssessmentSource(
        source_type=mlflow.entities.assessment.AssessmentSourceType.HUMAN,
        source_id="bob@example.com",
    ),
)

# Log expected response text
mlflow.log_expectation(
    trace_id=trace_id,
    name="expected_response",
    value="The capital of France is Paris.",
    source=mlflow.entities.assessment.AssessmentSource(
        source_type=mlflow.entities.assessment.AssessmentSourceType.HUMAN,
        source_id="bob@example.com",
    ),
)

human feedback

Next steps

Continue your journey with these recommended actions and tutorials.

Collect domain expert feedback - Set up structured labeling sessions
Build evaluation datasets - Use your labeled traces to create test datasets
Collect end-user feedback - Capture feedback from deployed applications

Reference guides

Explore detailed documentation for concepts and features mentioned in this guide.

Logging assessments - Deep dive into assessment types and usage
Tracing data model - Understand how assessments attach to traces
Labeling schemas - Learn about structured feedback collection

Prerequisites​

Add labels to traces via the UI​

Add labels to traces via the SDK​

Next steps​

Reference guides​

Prerequisites

Add labels to traces via the UI

Add labels to traces via the SDK

Next steps

Reference guides