Deploy an agent for generative AI applications
Deploy your AI agent on Mosaic AI Model Serving using the deploy() function from the Agent Framework Python API. Deployment creates a serving endpoint with built-in scalability, monitoring, and collaboration tools.
Your deployed agent automatically integrates with MLflow 3 evaluation and monitoring capabilities, including real-time tracing, the Review App for stakeholder feedback, and monitoring.
Requirements
- MLflow 3
- MLflow 2.x
- Register your agent in Unity Catalog.
- Install MLflow 3.1.3 or above to deploy agents using the
deploy()API fromdatabricks.agents. - Deploying agents from outside a Databricks notebook requires
databricks-agentsSDK version 1.1.0 or above.
Install the prerequisites:
# Install prerequisites
%pip install mlflow>=3.1.3 databricks-agents>=1.1.0
# Restart Python to make sure the new packages are picked up
dbutils.library.restartPython()
Databricks recommends using MLflow 3 to deploy agents as some MLflow 2 logging functionality will be deprecated. See detailed deployment actions.
- Register your agent in Unity Catalog.
- Install MLflow 2.13.1 or above to deploy agents using the
deploy()API fromdatabricks.agents. - Deploying agents from outside a Databricks notebook requires
databricks-agentsSDK version 0.12.0 or above.
Install the prerequisites:
# Install prerequisites
%pip install mlflow>=2.13.1 databricks-agents>=0.12.0
# Restart Python to make sure the new packages are picked up
dbutils.library.restartPython()
Deploy agents using deploy()
Deploy your agent to a model serving endpoint:
from databricks import agents
deployment = agents.deploy(uc_model_name, uc_model_info.version)
# Retrieve the query endpoint URL for making API requests
deployment.query_endpoint
When you call deploy(), Databricks automatically sets up production infrastructure and integrates your agent with MLflow gen AI features by doing the following:
If you are deploying an agent from a notebook stored in a Databricks Git folder, MLflow 3 real-time tracing will not work by default.
To enable real-time tracing, set the experiment to a non-Git-associated experiment using mlflow.set_experiment() before running agents.deploy().
The deploy() function performs the following actions by default:
- Creates a model serving endpoint to host your agent with automatic scaling and load balancing
- Provisions secure authentication for your agent to access underlying resources
- Enables real-time monitoring through MLflow experiment tracing and automated quality assessment on production traffic
- Sets up stakeholder collaboration using the Review App for feedback collection
For more information, see Detailed deployment actions.
Customize deployment
Pass additional arguments to deploy() to customize the deployment. For example, you can enable scale to zero for idle endpoints by passing scale_to_zero_enabled=True. This reduces costs but increases the time to serve initial queries.
For more parameters, see Databricks Agents Python API.
Retrieve and delete agent deployments
Retrieve or manage existing agent deployments. See Databricks Agents Python API.
from databricks.agents import list_deployments, get_deployments, delete_deployment
# Print all current deployments
deployments = list_deployments()
print(deployments)
# Get the deployment for a specific agent model name and version
agent_model_name = "" # Set to your Unity Catalog model name
agent_model_version = 1 # Set to your agent model version
deployment = get_deployments(model_name=agent_model_name, model_version=agent_model_version)
# List all deployments
all_deployments = list_deployments()
# Delete an agent deployment
delete_deployment(model_name=agent_model_name, model_version=agent_model_version)
Authentication for dependent resources
Agents often need to authenticate to other resources to complete tasks when they are deployed. For example, an agent may need to access a Vector Search index to query unstructured data.
For information about authentication methods including when to use them and how to set them up, see Authentication for AI agents.
Detailed deployment actions
The following table lists detailed deployment actions that result from a deploy() call. Deployments can take up to 15 minutes to complete.
- MLflow 3
- MLflow 2
| Description |
|---|---|
Create model serving endpoint | Creates a scalable REST API endpoint that serves your agent to user-facing applications with automatic load balancing. |
Provision secure authentication | Automatically provides short-lived credentials that allow your agent to access Databricks-managed resources (Vector Search indexes, Unity Catalog functions, etc.) with minimum required permissions. Databricks verifies the endpoint owner has proper permissions before issuing credentials, preventing unauthorized access. For non-Databricks resources, pass environment variables with secrets to |
Enable Review App | Provides a web interface where stakeholders can interact with your agent and provide feedback. See Collect domain expert feedback. |
Enable real-time tracing | Logs all agent interactions to an MLflow experiment in real time, providing immediate visibility for monitoring and debugging.
|
Enable production monitoring (beta) | Configures automated quality evaluation that runs scorers on production traffic. See production monitoring. |
Enable inference tables | Creates tables that log request inputs and responses for auditing and analysis. Warning: Request logs and assessment logs are deprecated and will be removed in a future release. Use MLflow 3 real-time tracing instead. See request logs and assessment logs deprecation for migration guidance.
|
Log REST API requests and Review App feedback | Logs API requests and feedback to an inference table. Warning: The feedback model is deprecated and will be removed in a future release. Upgrade to MLflow 3 and use the
|
| Description |
|---|---|
Create model serving endpoint | Creates a scalable REST API endpoint that serves your agent to user-facing applications with automatic load balancing. |
Provision secure authentication | Automatically provides short-lived credentials that allow your agent to access Databricks-managed resources (Vector Search indexes, Unity Catalog functions, etc.) with minimum required permissions. Databricks verifies the endpoint owner has proper permissions before issuing credentials, preventing unauthorized access. For non-Databricks resources, pass environment variables with secrets to |
Enable Review App | Provides a web interface where stakeholders can interact with your agent and provide feedback. See Collect domain expert feedback. |
Enable inference tables | Creates tables that log request inputs and responses for auditing and analysis. Warning: Request logs and assessment logs are deprecated and will be removed in a future release. See request logs and assessment logs deprecation for migration guidance.
|
Log REST API requests and Review App feedback (deprecated) | Logs API requests and feedback to an inference table. Warning: The feedback model is deprecated and will be removed in a future release. Upgrade to MLflow 3 and use the
|