Datasets and Labeling Sessions
Introduction
This notebook describes how you can:
- Create an evaluation set, backed by a Unity Catalog delta table
- Leverage subject matter experts (SME) to build an evaluation dataset
- Leverage SME to label traces generated by a version of an Agent to understand quality
- Give your SME a pre-production version of your Agent so they can chat with the bot and give feedback
Please provide
- The destination UC table name for the evaluation dataset
- An experiment name to host the labeling sessions
- A list of SME emails who can write assessments
Create a dataset and collect assessments
We bootstrap the evaluation dataset using synthetic data generation. For more details on synthetic evals, see Synthesize Evaluation sets.
This cell adds the evals above to the evaluation dataset. The evaluation dataset is backed by a Delta table in Unity Catalog.
Register an agent with the Review App
The following cell adds an agent to the review app for the SME to use in "chat" mode or labeling. The Agent will get registered with a name, and is used when the labeling session is created.
Create a labeling session from the eval dataset
The following cell creates a labeling session for the SME to review the dataset we created above.
We will configure the labeling session with a set of label schemas, which are the questions that get asked to the SME.
We will ask the SME:
- "Please provide a list of facts that you expect to see in a correct response" and collect a set of "expected_facts"
Sync expectations back to the evaluation dataset
After the SME is done labeling, you can sync the expectations back to the dataset.
Now we can run evaluations using the updated dataset.
Label traces from an MLFlow run
If you already have traces logged into a run, you can add them to the labeling session for your SME to provide assessments. Below we log example traces to an MLFlow run.
Add the traces to a labeling session
Below we select the traces from the run above and add them to a labeling session with a custom feedback label that asks our SME to label the formality of the response.
After the SME is done labeling, we can see the results via search_traces
, like earlier in the notebook.
Label traces from an Inference table
If you already have traces in an inference table (request logs), you can add them to the labeling session for your SME to provide assessments.
After the SME is done labeling, we can see the results via search_traces
, like earlier in the notebook.