Introduction to building gen AI apps on Databricks

Mosaic AI provides a comprehensive platform to build, deploy, and manage GenAI applications. This article guides you through the essential components and processes involved in developing GenAI applications on Databricks.

Mosaic AI Agent Framework

Agent Framework comprises a set of tools on Databricks designed to help developers build, deploy, and evaluate production-quality AI agents like Retrieval Augmented Generation (RAG) applications.

Building high-quality agents requires a robust evaluation toolset to test and validate agent systems. Mosaic AI Agent Evaluation provides a platform to capture and implement human feedback, ground truth, response and request logs, LLM judge feedback, chain traces, and more.

Deploy a generative AI agent

Databricks supports deploying a generative AI agent using the deploy() method in the Mosaic AI Agent Framework. This method automatically creates:

  • A CPU endpoint for deployment and testing using Model Serving.

  • A URL to the Agent Evaluation review app where stakeholders can interact with the agent and record feedback.

See Deploy an agent for generative AI application for additional detail about deploying agents.

Deploy a generative AI model

Mosaic AI Model Serving supports serving and querying generative AI models using the following capabilities:

  • Foundation Model APIs. This functionality makes state-of-the-art open models and fine-tuned model variants available to your model serving endpoint. These models are curated foundation model architectures that support optimized inference. Base models, like DBRX Instruct, Llama-2-70B-chat, BGE-Large, and Mistral-7B are available for immediate use with pay-per-token pricing, and workloads that require performance guarantees, like fine-tuned model variants, can be deployed with provisioned throughput.

  • External models. These are generative AI models that are hosted outside of Databricks. Endpoints that serve external models can be centrally governed and customers can establish rate limits and access control for them. Examples include foundation models like OpenAI’s GPT-4, Anthropic’s Claude, and others.

See Create generative AI model serving endpoints.