Creating a πŸ” Retriever version

Preview

This feature is in Private Preview. To try it, reach out to your Databricks contact.

Looking for a different RAG Studio doc? Go to the RAG documentation index

Conceptual overview

The πŸ” Retriever is logic that retrieves relevant chunks from a Vector Index. Given the dependencies between processing logic and retrieval logic, a πŸ” Retriever is associated with 1+ πŸ—ƒοΈ Data Processors. A πŸ” Retriever can be associated with (used by) any number of πŸ”— Chains.

A πŸ” Retriever can be a simple call to a Vector Index or a more complex series of steps including a re-ranker.

Note

In v2024-01-19, the πŸ” Retriever provides only retriever configuration settings. In this release, you must include the code for your πŸ” Retriever within your πŸ”— Chain’s code.

Tip

🚧 Roadmap 🚧 Support for managing the πŸ” Retriever code separately from the πŸ”— Chain.

Note

In v2024-01-19, in order to enable πŸ“ Trace logging, you must use a LangChain Retriever as part of a LangChain defined chain inside your πŸ”— Chain.

Tip

🚧 Roadmap 🚧 Support for non-LangChain retrievers and integrations with other frameworks such as Llama-Index.

Tip

🚧 Roadmap 🚧 Support for multiple πŸ”— Chain per RAG Application. In v2024-01-19, only one πŸ”— Chain can be created per RAG Application.

Step-by-step instructions

  1. Open the rag-config.yml in your IDE/code editor.

  2. Edit the retrievers configuration.

    retrievers:
      - name: ann-retriever
        description: Basic ANN retriever
        # explicit link to the data processor that this retriever uses.
        data_processors:
          - name: spark-docs-processor
        # these are key-value pairs that can be specified by the end user
        configurations:
          k: 5
          use_mmr: false