View logs & assessments

Preview

This feature is in Private Preview. To try it, reach out to your Databricks contact.

Looking for a different RAG Studio doc? Go to the RAG documentation index

This tutorial walks you through the process of viewing logs from your application:

  • 🗂️ Request Log: Detailed traces of the 🔗 Chain executions

  • 👍 Assessment & Evaluation Results Log: Assessments from 👤 End Users & 🧠 Expert Users and 🤖 LLM Judges

We will use the 💬 Review UI deployed in the previous tutorial to generate a few logs and view the data.

It assumes you have followed the steps in Initialize a RAG Application.

Step 1: Collect Assessments from humans using the Review App

  1. Open the 💬 Review UI you deployed in the previous step.

  2. Interact with the application by asking questions.

    We suggest the following:

    • Press New Chat on the left side.

      • Ask what is rag studio? followed by how do i set up the dev environment for it?

    • Press New Chat on the left side.

      • Ask what is mlflow?

    RAG application
  3. After asking a question, you will see the feedback widget appear below the bot’s answer. At minimum, provide a thumbs up or thumbs down signal for “Is this response correct for your question?”.

    Before providing feedback:

    feedback ui before

    After providing feedback:

    feedback ui completed

Step 2: Collect Assessments from a LLM judge

  1. The sample app is configured to automatically have an 🤖 LLM Judge provide Assessments for every interaction with the application.

    Note

    For more information on configuring LLM judges, see 🤖 LLM Judge

  2. As such, an LLM judge has already provided assessments for the questions you asked in Step 1!

Step 3: Run online evaluation ETL

In the Reviewers and End Users environments, the ETL job for processing logs and assessment automatically runs. In the development environment (where we are working now), you need to manually run the ETL job.

Warning

RAG Studio logging is based on Inference Tables - logs can take 10 - 30 minutes before they are ready to be ETLd. If you run the below job and do not see any results, wait 10 minutes and try again.

  1. Run the following command to start the logs ETL process. This step will take approximately 5 minutes.

    ./rag run-online-eval -e dev
    

Step 4. View the logs

RAG Studio stores all logs within the Unity Catalog schema that you configured.

Note

The logging schema is designed to enable measurement of metrics. For more information on how these logs are used to compute metrics, see Metrics.

  1. Open the Catalog browser and navigate to your schema.

  2. In the schema, you will see the below tables

    1. 🗂️ Request Log: Detailed traces of the 🔗 Chain executions; created by the ETL job

    2. 👍 Assessment & Evaluation Results Log: Assessments from 👤 End Users & 🧠 Expert Users and 🤖 LLM Judges; created by the ETL job

    3. Raw payload logging table: Raw payload logs that are used by the ETL job.

    logs
  3. Let’s first explore the 🗂️ Request Log.

    select * from catalog.schema.`rag_studio_databricks-docs-bot_dev_request_log`
    
    • request: The user’s input to the bot

    • trace: Step-by-step logs of each step exeecuted by the app’s 🔗 Chain

    • output: The bot’s generated response that was returned to the user

    logs
  4. Next, let’s explore the 👍 Assessment & Evaluation Results Log. Each request.request_id has multiple assessments.

    select * from catalog.schema.`rag_studio_databricks-docs-bot_dev_assessment_log`
    
    • request_id: Maps to request.request_id in the 🗂️ Request Log

    • source: Who provided the feedback - the user id of the human or the 🤖 LLM Judge ID

    • text_assessment: The source’s assessment of the request

    • output: The bot’s generated response that was returned to the user

    logs

    Note

    There is an additional column called retrieval_assessments - this is used for assessments of the 🔍 Retriever. In this release of RAG Studio, retrieval assessment is only possible using a 📖 Evaluation Set and offline evaluation. Future releases will include support for capturing retrieval assessment’s from users in the 💬 Review UI and from 🤖 LLM Judges.