Skip to main content

Concepts: Generative AI on Databricks

A GenAI app is an application that uses generative AI models (such as large language models, image generation models, and text-to-speech models) to create new outputs, automate complex tasks, or engage in intelligent interactions based on user input.

A GenAI app can be powered by simple calls to LLMs or other GenAI models, or by complex AI agents. Read more about levels of complexity.

Agents, tools, evaluation, models, and other aspects of GenAI apps can be customized with your proprietary data. This data-driven customization leads to data intelligence, allowing you to go beyond the general intelligence offered by canned AI models.

GenAI applications

A user-facing GenAI application can take many forms, such as:

Success with GenAI applications often requires two sets of skills: application development and AI evaluation. GenAI app development is much like developing non-AI applications, requiring software skills that depend on the type of application. However, evaluation for GenAI applications requires specialized tools and techniques to handle the complexity and open-ended responses from GenAI.

To learn about building industry-specific GenAI apps on Databricks, see:

GenAI evaluation

GenAI models, agents, and applications often have complex, open-ended behavior. Users can be allowed to enter any query. An AI agent can be allowed to gather text, images, and more during execution. The output can be arbitrary text, images, or other media, and there can be many "good" answers.

These complications make it challenging to evaluate GenAI. Proper evaluation requires:

  • Automation using AI to evaluate AI
  • Human feedback from experts and users to collect ground-truth and calibrate automated evaluation
  • Deep-diving into complex agents to understand and debug behavior

Databricks-managed MLflow and related tooling provide the foundations for GenAI evaluation:

Agents

An agent or agent system is an AI-driven system that can autonomously perceive, decide, and act in an environment to achieve goals. Unlike a standalone GenAI model that only produces an output when prompted, an agent system possesses a degree of agency. Modern AI agents use a GenAI model as the "brain" of a system that:

  1. Receives user requests or messages from another agent.
  2. Reasons about how to proceed: which data to fetch, which logic to apply, which tools to call, or whether to request more input from the user.
  3. Executes a plan and possibly calls multiple tools or delegates to sub-agents.
  4. Returns an answer or prompts the user for additional clarification.

By bridging general intelligence (the GenAI model's pretrained capabilities) and data intelligence (the specialized knowledge and APIs specific to your business), agent systems enable high-impact enterprise use cases such as advanced customer service flows, data-rich analytics bots, and multi-agent orchestration for complex operational tasks.

There is a continuum from simple GenAI models to complex agents. To learn more, see Agent system design patterns.

Databricks provides a range of options for building agents, from fully guided to fully custom:

  • AI Playground provides a UI for prototyping tool-calling agents, from which you can export generated agent code.
  • Agent Framework allows you to build and deploy agents using custom code or third-party agent authoring libraries.

Tools

AI agents can call tools to gather information or perform actions. Tools are single-interaction functions that an LLM can invoke to accomplish a well-defined task. The AI model typically generates parameters for each tool call, and the tool provides a straightforward input-output interaction.

Common tool categories include:

  • Tools that retrieve or analyze data
    • Semantic retrieval: Query a vector index to locate relevant text or other unstructured data.
    • Structured retrieval: Run SQL queries or use APIs to retrieve structured information.
    • Web search tool: Search the internet or an internal web corpus.
    • Classic ML models: Invoke machine learning models to perform classification, regression, or other predictions.
    • GenAI models: Generate specialized outputs such as code or images.
  • Tools that modify the state of an external system
    • API call: Call CRM endpoints, internal services, or other third-party integrations.
    • Email or messaging app integration: Post a message or send a notification.
  • Tools that run logic or perform a specific task
    • Code execution: Run user-supplied or LLM-generated code in a sandbox.

Tools can be built into agentic logic or accessed using standardized interfaces like MCP.

Tools vs. agents:

  • Tools perform a single, well-defined operation. Agents can perform more open-ended tasks.
  • Tools are generally stateless and do not maintain ongoing context beyond each invocation. Agents maintain state as they iteratively solve tasks.

Tool error handling and safety:

Because each tool call is an external operation such as an API call, the system should handle failures gracefully. Time-outs, malformed responses, or invalid inputs should not cause the agent itself to fail completely. In production, limit the number of allowed tool calls, have a fallback response if tool calls fail, and apply guardrails to ensure the agent system does not repeatedly attempt the same failing action.

Learn more about AI tools in Databricks:

GenAI models and LLMs

Large Language Models (LLMs) are AI models trained on massive text data sets that can understand, generate, and reason about human language. LLMs power applications like chatbots, code assistants, and content generation tools by predicting and producing contextually relevant text based on input prompts.

More generally, GenAI models or foundation models are trained on massive text, image, video, audio, or other data in order to learn about modes beyond text. Multi-modal models learn to connext human language with images, audio, and other media. LLMs are a type of GenAI or foundation model, though these terms are often used loosely and interchangeably.

GenAI models provide the intelligence behind GenAI agents and apps. Simple apps are often built using a single model customized with prompt engineering.

Learn about using GenAI models on Databricks:

Prompt engineering

GenAI models generally take prompts, or instructions telling the model how to handle user input. Prompts can be heavily customized with detailed steps, expert knowledge, data, and other information.

Databricks provides flexible ways to do prompt engineering. For example:

What is a GenAI platform?

GenAI requires a combined data + AI platform. For both developers and administrators, the key components for GenAI must be connected and governed in a simple, unified platform.

Key components include:

  • AI assets such as models, agents, and apps
  • Data assets such as files, tables, processing pipelines, vector indexes, and feature stores
  • AI deployments such as endpoints for models and agents
  • Tooling for building and deploying AI and data assets

Key governance capabilities include:

Also see Mosaic AI capabilities for GenAI and Databricks architecture.

Learn more