structured-data-for-rag(Python)

Loading...

Using Databricks online tables and feature serving endpoints for retrieval augmented generation (RAG)

This notebook illustrates how to use Databricks online tables in conjunction with feature serving endpoints to power your applications with enterprise data in real-time and at production scale.

This notebook creates a dummy data set and uses LangChain as an orchestration layer to augment the response of a chatbot with enterprise data served in real-time from an online table.

Install required packages

Create and publish feature tables

The following cells create feature tables with dummy data.

Setup catalog and schema to use

Create feature tables and function

The following cell creates feature tables for user preferences and hotel prices, and it creates a function to calculate total hotel price including taxes for the duration of the stay.

Create dummy data for feature tables

Sync the feature tables to a Databricks online table

Databricks online tables are managed tables to provide low-latency lookup of your data in ML serving solutions. The following code uses the Databricks SDK to create an online table.

You can also use the Databricks UI to create the online table. For more information, see the Databricks online tables documentation (AWS | Azure).

Create a vector index for unstructured data searches

Databricks vector search allows you to ingest and query unstructured data.

Calculate embedding using Databricks foundational model

Create feature table for hotel characteristics

Setup Vector Search Index

Create a vector search index based on the embeddings feature table

Wait for the vector search index to be ready

Set up feature serving endpoint

AI Bot powered by Databricks Feature Serving and Databricks online tables

  1. Automatically sync data from Delta table to online table.
  2. Lookup features in real-time with low latency.
  3. Provide context and augment chatbots with enterprise data as shown in this example.
  4. Implement best practices of data management in MLOps with LLMOps.

Define a tool to retreive customers and revenues

The CustomerRetrievalTool queries the Feature Serving endpoint to serve data from the Databricks online table, thus providing context data based on the user query to the LLM.

Set up an agent that fetches enterprise data from the Databricks Lakehouse using Feature Serving

The following cell uses Open API Keys. To learn how to configure secrets in Databricks secret managers, see the Databricks secret management documentation (AWS | Azure).

By incorporating context from the Databricks Lakehouse including online tables and a feature serving endpoint, an AI chatbot created with context retrieval tools performs much better than a generic chatbot.

      Cleanup

      Uncomment lines 2 - 5 in the following cell to clean up the endpoints created in this notebook.