This feature is in Private Preview. To try it, reach out to your Databricks contact.
Looking for a different RAG Studio doc? Go to the RAG documentation index
Core to RAG Studio is always-on
📝 Trace logging. Every time your app is invoked, RAG Studio automatically captures a detailed, step-by-step log of every action taken inside the
🔗 Chain, saving it to the
🗂️ Request Log which is simply a Delta Table.
This logging is based on Model Serving’s Inference Tables functionality.
🗂️ Request Log schema for more details.
🗂️ Request Log, you can associate a
👍 Assessment & Evaluation Results Log with that log. An assessment represents feedback about that
📝 Trace e.g., were the retrieved documents relevant? was the answer correct? etc. Each
📝 Trace can have multiple assessments from different sources: one of your
🧠 Expert Users,
👤 End Users, or a
🤖 LLM Judge
👍 Assessment & Evaluation Results Log schema for more details.
🗂️ Request Log and
👍 Assessment & Evaluation Results Log are used to compute metrics that allow you to understand the quality, cost, and latency of your RAG Application based on feedback collected from
👤 End Users and
🧠 Expert Users. The metric computations are added to the
👍 Assessment & Evaluation Results Log table and can be accessed through the
🕵️♀️ Exploration & Investigation UI.
View the metrics computed by RAG Studio for more details.
Offline evaluations allow you to curate
📖 Evaluation Sets which are
🗂️ Request Log (optionally linked with the ground-truth answer from a
👍 Assessment & Evaluation Results Log) that contain representative queries your RAG Application supports. You use a
📖 Evaluation Set to compute the same metrics as in online evaluations, however, offline evaluation is typically done to assess the quality, cost, and latency of a new version before deploying the RAG Application to your users.
In order create RAG Applications that deliver accurate answers, you must be able to quickly create and compare different BOTH end-to-end versions of your RAG Application and versions of the individual components (
🗃️ Data Processor,
🔗 Chain, etc) that make up your RAG Application. For example, you might want to see how
chunk_size = 500 compares to
chunk_size = 1000. RAG Studio supports logging versions - each version represents the code and configuration for the individual components.
A core concept of RAG Studio is that all infrastructure and data schemas are unified between development and production. This enables you to quickly test a new version with
🧠 Expert Users, then deploy it to production once validated – using the same instrumentation code and measuring the same metrics in both environments.
However, having the same infrastructure and schemas in development and production can create a blurry line between these environments. RAG Studio supports multiple environments, because it is critically important that developers maintain a clean separation between these environments.
View the environments for more details.
⚙️ Global Configuration: the app’s name, Databricks workspace where the app is deployed, the Unity Catalog schema where assets are stored, and (optionally) the MLflow experiment and vector search endpoint.
🤖 LLM Judgeconfiguration: configuration for how
🤖 LLM Judges are run by RAG Studio.
📥 Data Ingestor: A data pipeline that ingests raw unstructured documents from a 3rd party raw data source (such as Confluence, Google Drive, etc) into a UC Volume. Each
📥 Data Ingestorcan be associated with any number of
🗃️ Data Processor.
🗃️ Data Processor: A data pipeline that parses, chunks, and embeds unstructured documents from a
📥 Data Ingestorinto chunks stored in a Vector Index. A
🗃️ Data Processoris associated with 1+
📥 Data Ingestor.
🔍 Retriever: Logic that retrieves relevant chunks from a Vector Index. Given the dependencies between processing logic and retrieval logic, a
🔍 Retrieveris associated to 1+
🗃️ Data Processors. A
🔍 Retrievercan be a simple call to a Vector Index or a more complex series of steps including a re-ranker.*
🔗 Chain: The orchestration code that glues together
🔍 Retrieverand Generative AI Models to turn a user query (question) into bot response (answer). Each
🔗 Chainis associated with 1+
🗂️ Request Log: The step-by-step
📝 Traceof every
🔗 Chaininvocation e.g., every user query & bot response along with detailed traces of the steps taken by the
🔗 Chainto generate that response.
👍 Assessment & Evaluation Results Log: User provided or
🤖 LLM Judgefeedback (thumbs up / down, edited bot responses, etc) that is linked to a
📝 Trace. The results from RAG Studio computing evaluations (aka metrics) are added to each row of this table.
📖 Evaluation Set:
🗂️ Request Log, optionally with associated
👍 Assessment & Evaluation Results Log, that contain representative questions/answers used for offline evaluation of the RAG Application.
📋 Review Set:
🗂️ Request Logthat are curated by the developer for the purposes of collecting
🧠 Expert Users’s feedback in order to create
📖 Evaluation Sets.
💬 Review UI: A chat-based web app for soliciting feedback from
🧠 Expert Usersor for a
👩💻 RAG App Developerto test the app.
🕵️♀️ Exploration & Investigation UI: A UI, built into Databricks, for viewing computed evaluations (metrics) about a RAG Application version and investigating individual
🗂️ Request Logs and
👍 Assessment & Evaluation Results Logs.