databricks-logo

Review App

(Python)
Loading...

Datasets and Labeling Sessions

Introduction

This notebook describes how you can:

  • Create an evaluation set, backed by a Unity Catalog delta table
  • Leverage subject matter experts (SME) to build an evaluation dataset
  • Leverage SME to label traces generated by a version of an Agent to understand quality
  • Give your SME a pre-production version of your Agent so they can chat with the bot and give feedback
3

Please provide

  • The destination UC table name for the evaluation dataset
  • An experiment name to host the labeling sessions
  • A list of SME emails who can write assessments
5

Create a dataset and collect assessments

We bootstrap the evaluation dataset using synthetic data generation. For more details on synthetic evals, see Synthesize Evaluation sets.

Generate synth evals

This cell adds the evals above to the evaluation dataset. The evaluation dataset is backed by a Delta table in Unity Catalog.

Add the evals to a dataset and make a labeling session

Register an agent with the Review App

The following cell adds an agent to the review app for the SME to use in "chat" mode or labeling. The Agent will get registered with a name, and is used when the labeling session is created.

11

Create a labeling session from the eval dataset

The following cell creates a labeling session for the SME to review the dataset we created above.

We will configure the labeling session with a set of label schemas, which are the questions that get asked to the SME.

We will ask the SME:

  • "Please provide a list of facts that you expect to see in a correct response" and collect a set of "expected_facts"
13

14

Sync expectations back to the evaluation dataset

After the SME is done labeling, you can sync the expectations back to the dataset.

16

Now we can run evaluations using the updated dataset.

Run mlflow.evaluate on the dataset

Label traces from an MLFlow run

If you already have traces logged into a run, you can add them to the labeling session for your SME to provide assessments. Below we log example traces to an MLFlow run.

20

Add the traces to a labeling session

Below we select the traces from the run above and add them to a labeling session with a custom feedback label that asks our SME to label the formality of the response.

22

After the SME is done labeling, we can see the results via search_traces, like earlier in the notebook.

24

Label traces from an Inference table

If you already have traces in an inference table (request logs), you can add them to the labeling session for your SME to provide assessments.

26

After the SME is done labeling, we can see the results via search_traces, like earlier in the notebook.

28

;