dspy-data-preparation(Python)

Loading...

Part 1: Prepare data and vector search index for a RAG DSPy program

This notebook shows how to accomplish the following tasks to prepare text data for your retrieval augmented generation (RAG) DSPy program:

  • Set up your environment
  • Create a Delta table of chunked data
  • Create a Vector Search index (AWS | Azure)

This notebook is part 1 of 2 notebooks for creating a DSPy program for RAG.

Requirement

  • Create a vector search endpoint (AWS | Azure)

Install dependencies

3

Define notebook widgets

5

Define configurations

7

Create Delta table

The following shows how to create a Delta table that contains chunked Wikipedia entries from Databricks sample datasets.

9

Create a vector search index

The following deletes any previously existing vector search index with the same name. You can skip this command if you don't have a vector search index with that name.

12

The following creates a vector search index from the Delta table of chunked documents and reports status as it progresses.

Note: This block will wait until the index is created. Expect this command to take several minutes as you wait for your index to come online.

14

Now you are ready to Create DSPy program for RAG!