Creating a πŸ”— Chain version


This feature is in Private Preview. To try it, reach out to your Databricks contact.

Looking for a different RAG Studio doc? Go to the RAG documentation index

Conceptual overview

The πŸ”— Chain is the β€œheart” of your application and contains the orchestration code that glues together a πŸ” Retriever, Generative AI Models, and often other APIs/services to turn a user query (question) into a bot response (answer). Each πŸ”— Chain is associated with 1+ πŸ” Retrievers.

To use RAG Studio, the bare minimum requirement is to configure a πŸ”— Chain.

An example πŸ”— Chain might accept a user query, perform query processing, query a πŸ” Retriever, and then prompt a Generative AI Models with the query and retriever results to generate a response to the user. However, πŸ”— Chain logic can be arbitrarily complex and often includes additional steps.

RAG Studio is compatible with any MLflow logged model that has the following request/response schema. The request schema follows the OpenAI ChatMessages format and the response schema follows the ChatResponse.

request_signature = {
    # `messages` is an Array of [ChatMessages](/machine-learning/foundation-models/
    # To support support multi-turn conversation, your front end application can pass an array of length >1, where the array alternates between role = "user" and role = "assistant".
    # The last message in the array must be of role = "user"
    "messages": [{"role": "user", "content": "This is a question to ask?"}]

response_signature = {
    # `choices` is an array of ChatCompletionChoice
    # There can be 1+ choices, but each choice must have a single [ChatMessages](/machine-learning/foundation-models/ with role = "assistant"
    "choices": [{
        "index": 0,
        "message": {"role": "assistant", "content": "This is the correct answer."},
        "finish_reason": "stop"
    "object": "chat.completions"
    # TODO: add the rest of schema here


In v2024-01-19, while you can use any MLflow model, in order to enable πŸ“ Trace logging, you must use a LangChain defined chain inside your πŸ”— Chain. Future versions will enable the RAG Trace Logging API to be called directly by your code.


🚧 Roadmap 🚧 Support for Llama-Index chains

A πŸ”— Chain consists of:

  1. Configuration stored in the chains section of rag-config.yml

  2. Code stored in app-directory/src/ that configures the chain’s logic and logs it as a Unity Catalog Model.

You can configure a Generative AI Models in rag-config.yml. This embedding model can be any Foundational Model APIs pay-per-token, Foundational Model APIs provisioned throughput or External Model Endpoint that supports the a `llm/v1/chat` task.


🚧 Roadmap 🚧 Support for multiple πŸ”— Chain per RAG Application. In v2024-01-19, only one πŸ”— Chain can be created per RAG Application.


πŸ”— Chains must be deployed to Databricks Model Serving in order to enable πŸ“ Trace logging and πŸ‘ Assessments collection.

Data flows


Step-by-step instructions

  1. Open the rag-config.yml in your IDE/code editor.

  2. Edit the chains configuration.

      - name: spark-docs-chain # User specified, must be unique, no spaces
        description: Spark docs chain # User specified, any text string
        # explicit link to the retriever that this chain uses.
        # currently, only one retriever per chain is supported, but this schema allows support for adding multiple in the future
          - name: ann-retriever
          - name: llama-2-70b-chat # user specified name to reference this model in the chain & to override per environment. Must be unique.
            type: v1/llm/chat
            endpoint_name: databricks-llama-2-70b-chat
                - role: "system"
                  content: "You are a trustful assistant for Databricks users. You are answering python, coding, SQL, data engineering, spark, data science, AI, ML, Datawarehouse, platform, API or infrastructure, Cloud administration question related to Databricks. If you do not know the answer to a question, you truthfully say you do not know. Read the discussion to get the context of the previous conversation. In the chat discussion, you are referred to as 'system'. The user is referred to as 'user'."
                - role: "user"
                  content: "Discussion: {chat_history}. Here's some context which might or might not help you answer: {context} Answer straight, do not repeat the question, do not start with something like: the answer to the question, do not add 'AI' in front of your answer, do not say: here is the answer, do not mention the context or the question. Based on this history and context, answer this question: {question}"
              temperature: 0.9
              max_tokens: 200
  3. Edit the src/my_rag_builder/ to modify the default code or add custom code.

    If you just want to modify the default chain logic, edit the full_chain that defines a chain in LangChain LECL.


    You can modify this file in any way you see fit, as long as after the code finishes running, destination_model_name contains a logged MLflow model with the signature defined above, logged using the provided convenience function chain_model_utils.log_register_chain_model().

  4. To test the chain locally:

    1. Set the DATABRICKS_TOKEN environment variable to a Personal Access Token.

      export DATABRICKS_TOKEN=pat_token_key
    2. Update vector_search_index_name on line 204 to the name of a Vector Search index previously created with ./rag create-rag-version

    3. Uncomment all or part of lines 244-264 to print the chain output to the console.

    4. Run the file.