Key concepts of RAG Studio


This feature is in Private Preview. To try it, reach out to your Databricks contact.

Looking for a different RAG Studio doc? Go to the RAG documentation index


Core to RAG Studio is always-on 📝 Trace logging. Every time your app is invoked, RAG Studio automatically captures a detailed, step-by-step log of every action taken inside the 🔗 Chain, saving it to the 🗂️ Request Log which is simply a Delta Table.

This logging is based on Model Serving’s Inference Tables functionality.

View the 🗂️ Request Log schema for more details.


For every 🗂️ Request Log, you can associate a 👍 Assessment & Evaluation Results Log with that log. An assessment represents feedback about that 📝 Trace e.g., were the retrieved documents relevant? was the answer correct? etc. Each 📝 Trace can have multiple assessments from different sources: one of your 🧠 Expert Users, 👤 End Users, or a 🤖 LLM Judge

View the 👍 Assessment & Evaluation Results Log schema for more details.

Online evaluations

🗂️ Request Log and 👍 Assessment & Evaluation Results Log are used to compute metrics that allow you to understand the quality, cost, and latency of your RAG Application based on feedback collected from 👤 End Users and 🧠 Expert Users. The metric computations are added to the 👍 Assessment & Evaluation Results Log table and can be accessed through the 🕵️‍♀️ Exploration & Investigation UI.

View the metrics computed by RAG Studio for more details.

Offline evaluations

Offline evaluations allow you to curate 📖 Evaluation Sets which are 🗂️ Request Log (optionally linked with the ground-truth answer from a 👍 Assessment & Evaluation Results Log) that contain representative queries your RAG Application supports. You use a 📖 Evaluation Set to compute the same metrics as in online evaluations, however, offline evaluation is typically done to assess the quality, cost, and latency of a new version before deploying the RAG Application to your users.


In order create RAG Applications that deliver accurate answers, you must be able to quickly create and compare different BOTH end-to-end versions of your RAG Application and versions of the individual components (🗃️ Data Processor, 🔗 Chain, etc) that make up your RAG Application. For example, you might want to see how chunk_size = 500 compares to chunk_size = 1000. RAG Studio supports logging versions - each version represents the code and configuration for the individual components.

Unified online and offline schemas

A core concept of RAG Studio is that all infrastructure and data schemas are unified between development and production. This enables you to quickly test a new version with 🧠 Expert Users, then deploy it to production once validated – using the same instrumentation code and measuring the same metrics in both environments.


However, having the same infrastructure and schemas in development and production can create a blurry line between these environments. RAG Studio supports multiple environments, because it is critically important that developers maintain a clean separation between these environments.

View the environments for more details.

Key terminology

Application configuration

  • ⚙️ Global Configuration: the app’s name, Databricks workspace where the app is deployed, the Unity Catalog schema where assets are stored, and (optionally) the MLflow experiment and vector search endpoint.

  • 🤖 LLM Judge configuration: configuration for how 🤖 LLM Judges are run by RAG Studio.

Component code & configuration

  • 📥 Data Ingestor: A data pipeline that ingests raw unstructured documents from a 3rd party raw data source (such as Confluence, Google Drive, etc) into a UC Volume. Each 📥 Data Ingestor can be associated with any number of 🗃️ Data Processor.

  • 🗃️ Data Processor: A data pipeline that parses, chunks, and embeds unstructured documents from a 📥 Data Ingestor into chunks stored in a Vector Index. A 🗃️ Data Processor is associated with 1+ 📥 Data Ingestor.

  • 🔍 Retriever: Logic that retrieves relevant chunks from a Vector Index. Given the dependencies between processing logic and retrieval logic, a 🔍 Retriever is associated to 1+ 🗃️ Data Processors. A 🔍 Retriever can be a simple call to a Vector Index or a more complex series of steps including a re-ranker.*

  • 🔗 Chain: The orchestration code that glues together 🔍 Retriever and Generative AI Models to turn a user query (question) into bot response (answer). Each 🔗 Chain is associated with 1+ 🔍 Retrievers.

Data generated by RAG Studio

  • 🗂️ Request Log: The step-by-step 📝 Trace of every 🔗 Chain invocation e.g., every user query & bot response along with detailed traces of the steps taken by the 🔗 Chain to generate that response.

  • 👍 Assessment & Evaluation Results Log: User provided or 🤖 LLM Judge feedback (thumbs up / down, edited bot responses, etc) that is linked to a 📝 Trace. The results from RAG Studio computing evaluations (aka metrics) are added to each row of this table.

Data curated by the 👩‍💻 RAG App Developer

  • 📖 Evaluation Set: 🗂️ Request Log, optionally with associated 👍 Assessment & Evaluation Results Log, that contain representative questions/answers used for offline evaluation of the RAG Application.

  • 📋 Review Set: 🗂️ Request Log that are curated by the developer for the purposes of collecting 🧠 Expert Users’s feedback in order to create 📖 Evaluation Sets.

RAG Studio User interfaces

  • 💬 Review UI: A chat-based web app for soliciting feedback from 🧠 Expert Users or for a 👩‍💻 RAG App Developer to test the app.

  • 🕵️‍♀️ Exploration & Investigation UI: A UI, built into Databricks, for viewing computed evaluations (metrics) about a RAG Application version and investigating individual 🗂️ Request Logs and 👍 Assessment & Evaluation Results Logs.