Part 1: Prepare data and vector search index for a RAG DSPy program
This notebook shows how to accomplish the following tasks to prepare text data for your retrieval augmented generation (RAG) DSPy program:
- Set up your environment
- Create a Delta table of chunked data
- Create a Vector Search index (AWS | Azure)
This notebook is part 1 of 2 notebooks for creating a DSPy program for RAG.
Requirement
Install dependencies
Define notebook widgets
Define configurations
Create Delta table
The following shows how to create a Delta table that contains chunked Wikipedia entries from Databricks sample datasets.
Create a vector search index
The following deletes any previously existing vector search index with the same name. You can skip this command if you don't have a vector search index with that name.
The following creates a vector search index from the Delta table of chunked documents and reports status as it progresses.
Note: This block will wait until the index is created. Expect this command to take several minutes as you wait for your index to come online.
Now you are ready to Create DSPy program for RAG!