Test an app version with the Review App's Chat UI

The MLflow Review App includes a built-in chat interface that allows domain experts to interactively test your GenAI application and provide immediate feedback. Use the Chat UI as a way to vibe check your app.

Review App chat UI

When to use Chat UI testing

Chat UI testing is ideal when you want to:

Test conversational flows and multi-turn interactions with domain experts
Collect expert feedback on application responses and behavior
Validate updates in a safe environment before production deployment

Prerequisites

MLflow and required packages must be installed. The features described in this guide require MLflow version 3.1.0 or higher. Run the following command to install or upgrade the MLflow SDK, including extras needed for Databricks integration:
Bash
```
pip install --upgrade "mlflow[databricks]>=3.1.0" openai "databricks-connect>=16.1"
```
Your development environment must be connected to the MLflow Experiment where your GenAI application traces are logged.
- Follow Tutorial: Connect your development environment to MLflow to connect your development environment.
Domain experts need the following permissions to use the Review App's Chat UI:
- Account access: Must be provisioned in your Databricks account, but do not need access to your workspace.
  
  For users without workspace access, account admins can:
  - Use account-level SCIM provisioning to sync users from your identity provider
  - Manually register users and groups in Databricks
  See User and group management for details.
- Endpoint access: CAN_QUERY permission to the model serving endpoint.

Set up and collect feedback with the Chat UI

The MLflow Review App's Chat UI connects to a deployed version of your GenAI application, allowing domain experts to chat with your app and provide immediate feedback. Follow these steps to set up the Chat UI and collect feedback:

Package your app using Agent Framework and deploy it using Agent Framework as a Model Serving endpoint.

Add the endpoint to your experiment's review app:

note

The below example adds a Databricks hosted LLM to the review app. Replace the endpoint with your app's endpoint from step 1.

Python
from mlflow.genai.labeling import get_review_app

# Get review app for current MLflow experiment
review_app = get_review_app()

# Connect your deployed agent endpoint
review_app.add_agent(
    agent_name="claude-sonnet",
    model_serving_endpoint="databricks-claude-3-7-sonnet",
)

print(f"Share this URL: {review_app.url}/chat")

Once configured, share the Review App URL with your domain experts. They'll be able to:

Access the chat interface through their web browser
Interact with your application by typing questions
Provide feedback after each response using the built-in feedback controls
Continue the conversation to test multiple interactions

Review App content rendering

The Chat UI uses domain expert queries as input, live agent endpoint responses as output, and stores results in MLflow traces. You don't need to provide a custom labeling schema, as this approach uses fixed feedback questions.

The Review App automatically renders different content types from your MLflow Trace:

Retrieved documents: Documents within a RETRIEVER span are rendered for display
OpenAI format messages: Inputs and outputs of the MLflow Trace following OpenAI chat conversations are rendered:
- outputs that contain an OpenAI format ChatCompletions object
- inputs or outputs dicts that contain a messages key with an array of OpenAI format chat messages
  - If the messages array contains OpenAI format tool calls, they are also rendered
Dictionaries: Inputs and outputs of the MLflow Trace that are dicts are rendered as pretty-printed JSONs

Otherwise, the content of the input and output from the root span of each trace are used as the primary content for review.

View chat feedback

All interactions and feedback collected through the Chat UI are automatically captured as traces in MLflow.

To view the traces from chat interactions:

Navigate to the MLflow UI
Find the experiment associated with your Review App session
Browse the traces to see the full conversation history
Review the feedback attached to each response

Next steps

Learn how to label existing traces for more systematic feedback collection
Explore end-user feedback collection for production applications

When to use Chat UI testing​

Prerequisites​

Set up and collect feedback with the Chat UI​

Review App content rendering​

View chat feedback​

Next steps​