Step 5 (retrieval). How to debug retrieval quality
This page describes how to identify the root cause of retrieval problems. Use this page when root cause analysis indicates a root cause Improve Retrieval
.
Retrieval quality is arguably the most important component of a RAG application. If the most relevant chunks are not returned for a given query, the LLM does not have access to the necessary information to generate a high-quality response. Poor retrieval can lead to irrelevant, incomplete, or hallucinated output. This step requires manual effort to analyze the underlying data. Mosaic AI Agent Framework, with its tight integration between the data platform (including Unity Catalog and Vector Search), and experiment tracking with MLflow (including LLM evaluation and MLflow Tracing) makes troubleshooting much easier.
Instructions
Follow these steps to address retrieval quality issues:
Open the B_quality_iteration/01_root_cause_quality_issues notebook.
Use the queries to load MLflow traces of the records that had retrieval quality issues.
For each record, manually examine the retrieved chunks. If available, compare them to the ground-truth retrieval documents.
Look for patterns or common issues among the queries with low retrieval quality. For example:
Relevant information is missing from the vector database entirely.
Insufficient number of chunks or documents returned for a retrieval query.
Chunks are too small and lack sufficient context.
Chunks are too large and contain multiple, unrelated topics.
The embedding model fails to capture semantic similarity for domain-specific terms.
Based on the identified issue, hypothesize potential root causes and corresponding fixes. For guidance, see Common reasons for poor retrieval quality.
Follow the steps in implement and evaluate changes to implement and evaluate a potential fix. This might involve modifying the data pipeline (for example, adjusting chunk size or trying a different embedding model) or modifying the RAG chain (for example, implementing hybrid search or retrieving more chunks).
If retrieval quality is still not satisfactory, repeat steps 4 and 5 for the next most promising fixes until the desired performance is achieved.
Re-run the root cause analysis to determine if the overall chain has any additional root causes that should be addressed.
Common reasons for poor retrieval quality
The following table lists debugging steps and potential fixes for common retrieval issues. Fixes are categorized by component:
Data pipeline
Chain config
Chain code
The component defines which steps you should follow in the implement and evaluate changes step.
Retrieval issue |
Debugging steps |
Potential fix |
---|---|---|
Chunks are too small |
|
|
Chunks are too large |
|
|
Chunks don’t have enough information about the text from which they were taken |
|
|
Embedding model doesn’t accurately understand the domain or key phrases in user queries |
|
|
Relevant information missing from the vector database |
|
|
Retrieval queries are poorly formulated |
|
|
Next step
If you also identified issues with generation quality, continue with Step 5 (generation). How to debug generation quality.
If you think that you have resolved all of the identified issues, continue with Step 6. Make & evaluate quality fixes on the AI agent.