Skip to main content

Deploy an agent for generative AI applications

Deploy your AI agent on Mosaic AI Model Serving using the deploy() function from the Agent Framework Python API. Deployment creates a serving endpoint with built-in scalability, monitoring, and collaboration tools.

Your deployed agent automatically integrates with MLflow 3 evaluation and monitoring capabilities, including real-time tracing, the Review App for stakeholder feedback, and monitoring.

Requirements

  • Register your agent in Unity Catalog.
  • Install MLflow 3.1.3 or above to deploy agents using the deploy() API from databricks.agents.
  • Deploying agents from outside a Databricks notebook requires databricks-agents SDK version 1.1.0 or above.

Install the prerequisites:

Python
# Install prerequisites
%pip install mlflow>=3.1.3 databricks-agents>=1.1.0

# Restart Python to make sure the new packages are picked up
dbutils.library.restartPython()

Deploy agents using deploy()

Deploy your agent to a model serving endpoint:

Python
from databricks import agents

deployment = agents.deploy(uc_model_name, uc_model_info.version)

# Retrieve the query endpoint URL for making API requests
deployment.query_endpoint

When you call deploy(), Databricks automatically sets up production infrastructure and integrates your agent with MLflow gen AI features by doing the following:

warning

If you are deploying an agent from a notebook stored in a Databricks Git folder, MLflow 3 real-time tracing will not work by default.

To enable real-time tracing, set the experiment to a non-Git-associated experiment using mlflow.set_experiment() before running agents.deploy().

The deploy() function performs the following actions by default:

  • Creates a model serving endpoint to host your agent with automatic scaling and load balancing
  • Provisions secure authentication for your agent to access underlying resources
  • Enables real-time monitoring through MLflow experiment tracing and automated quality assessment on production traffic
  • Sets up stakeholder collaboration using the Review App for feedback collection

For more information, see Detailed deployment actions.

Customize deployment

Pass additional arguments to deploy() to customize the deployment. For example, you can enable scale to zero for idle endpoints by passing scale_to_zero_enabled=True. This reduces costs but increases the time to serve initial queries.

For more parameters, see Databricks Agents Python API.

Retrieve and delete agent deployments

Retrieve or manage existing agent deployments. See Databricks Agents Python API.

Python
from databricks.agents import list_deployments, get_deployments, delete_deployment

# Print all current deployments
deployments = list_deployments()
print(deployments)

# Get the deployment for a specific agent model name and version
agent_model_name = "" # Set to your Unity Catalog model name
agent_model_version = 1 # Set to your agent model version
deployment = get_deployments(model_name=agent_model_name, model_version=agent_model_version)

# List all deployments
all_deployments = list_deployments()

# Delete an agent deployment
delete_deployment(model_name=agent_model_name, model_version=agent_model_version)

Authentication for dependent resources

Agents often need to authenticate to other resources to complete tasks when they are deployed. For example, an agent may need to access a Vector Search index to query unstructured data.

For information about authentication methods including when to use them and how to set them up, see Authentication for AI agents.

Detailed deployment actions

The following table lists detailed deployment actions that result from a deploy() call. Deployments can take up to 15 minutes to complete.

deploy() action

Description

Create model serving endpoint

Creates a scalable REST API endpoint that serves your agent to user-facing applications with automatic load balancing.

Provision secure authentication

Automatically provides short-lived credentials that allow your agent to access Databricks-managed resources (Vector Search indexes, Unity Catalog functions, etc.) with minimum required permissions.

Databricks verifies the endpoint owner has proper permissions before issuing credentials, preventing unauthorized access.

For non-Databricks resources, pass environment variables with secrets to deploy(). See Configure access to resources from model serving endpoints.

Enable Review App

Provides a web interface where stakeholders can interact with your agent and provide feedback. See Collect domain expert feedback.

Enable real-time tracing

Logs all agent interactions to an MLflow experiment in real time, providing immediate visibility for monitoring and debugging.

  • Traces from your endpoint write to the currently active MLflow experiment (set with mlflow.set_experiment())
  • All agents in the endpoint share the same experiment for trace storage
  • Traces also write to inference tables for longer-term storage

Enable production monitoring (beta)

Configures automated quality evaluation that runs scorers on production traffic. See production monitoring.

Enable inference tables

Creates tables that log request inputs and responses for auditing and analysis.

Warning: Request logs and assessment logs are deprecated and will be removed in a future release. Use MLflow 3 real-time tracing instead. See request logs and assessment logs deprecation for migration guidance.

  • All agents use AI Gateway inference tables for logging
  • Streaming responses only log fields compatible with ResponsesAgent, ChatAgent, and ChatCompletion schemas

Log REST API requests and Review App feedback

Logs API requests and feedback to an inference table.

Warning: The feedback model is deprecated and will be removed in a future release. Upgrade to MLflow 3 and use the log_feedback API instead. See Collect user feedback.

  • Create feedback model to accept and log feedback from the Review App.
  • This model is served in the same CPU model serving endpoint as your deployed agent.

Next steps