Build genAI apps using DSPy on Databricks

This article describes DSPy and provides example notebooks demonstrating how to use DSPy on Databricks to build and optimize generative AI agents.

What is DSPy?

DSPy is a framework for programmatically defining and optimizing generative AI agents. DSPy can automate prompt engineering and orchestrate LLM fine-tuning to improve performance.

DSPy consists of several components that simplify agent development and improve agent quality:

  • Modules: In DSPy, these are components that handle specific text transformations, like answering questions or summarizing. They replace traditional hand-written prompts and can learn from examples, making them more adaptable.

  • Signatures: A natural language description of a module’s input and output behavior. For example, “question -> answer” specifies that the module should take a question as input and return an answer.

  • Compiler: This is DSPy’s optimization tool. It improves LM pipelines by adjusting modules to meet a performance metric, either by generating better prompts or fine-tuning models.

  • Program (DSPy): A set of modules connected into a pipeline to perform complex tasks. DSPy programs are flexible, allowing you to optimize and adapt them using the compiler.

Create a text classifier DSPy program

The following notebook shows how to create DSPy program that performs text classification. This example demonstrates how DSPy works and the components it uses.

Create a text classifier DSPy program notebook

Open notebook in new tab

Create a DSPy program for RAG

These notebooks show you how to create and optimize a basic RAG program using DSPy. These notebooks assume you are using serverless compute, and they install packages at the notebook level to ensure they run independently of the Databricks Runtime version.

Part 1: Prepare data and vector search index for a RAG DSPy program notebook

Open notebook in new tab

Part 2: Create and optimize a DSPy program for RAG notebook

Open notebook in new tab

Migrate LangChain to DSPy

These notebooks show how to migrate LangChain model code to DSPy and optimize it for better performance. These notebooks assume you are using serverless compute, and they install packages at the notebook level to ensure they run independently of the Databricks Runtime version.

Migrate LangChain model code to DSPy notebook

Open notebook in new tab

Optimize your migrated DSPy model notebook

Open notebook in new tab