Deploy an agent for generative AI application

Preview

This feature is in Public Preview.

This article shows how to deploy your agent either by directly using Model Serving or using the deploy() API from databricks.agents.

Requirements

  • Before you can deploy your agent you must register it to Unity Catalog. See Create and log AI agents. When you register your agent to Unity Catalog it is packaged in the form of a model.

  • MLflow 2.13.1 or above to deploy agents using the the deploy() API from databricks.agents.

Deploy an agent using Model Serving

Important

When you deploy a agent using this method you are not able to use the Review App to collect and submit feedback about your agent.

For production workloads, you can deploy your agent to make it available as a REST API that can be integrated into your user-facing application. You can use the Model Serving REST API to create a model serving CPU endpoint to deploy your production-ready agent.

Deploy an agent using deploy()

You can use the deploy() API to deploy your agents either for developing your agents or for deploying your production-ready agents. Only agents registered in Unity Catalog are able to be deployed using deploy().

The deploy() API does the following:

  • Creates CPU model serving endpoints for your agent that can be integrated into your user-facing application. These endpoints are created using Model Serving, so you can invoke them to get responses from the agent and collect feedback from the Review App UI.

    • Authentication credentials are automatically passed to all Databricks-managed resources required by the agent.

    • If you have resource dependencies that are not Databricks-managed, for example using Pinecone, you can pass in environment variables with secrets to the deploy() API. See Configure access to resources from model serving endpoints.

  • Enables the Review App for your agent. The Review App allows your stakeholders to chat with the agent and give feedback using the Review App UI.

  • Logs every request to the Review App or REST API such as query requests and responses and intermediate trace data to an inference table from MLflow Tracing.

Note

Deployments can take up to 15 minutes to complete. Raw JSON payloads take 10 - 30 minutes to arrive, and the formatted logs are processed from the raw payloads about every hour.


from databricks.agents import deploy
from mlflow.utils import databricks_utils as du

deployment = deploy(model_fqn, uc_model_info.version)

# query_endpoint is the URL that can be used to make queries to the app
deployment.query_endpoint

# Copy deployment.rag_app_url to browser and start interacting with your RAG application.
deployment.rag_app_url

Agent-enhanced inference tables

The deploy() creates three inference tables for each deployment to log requests and responses to and from the agent serving endpoint.

Table

Example Unity Catalog table name

What is in each table

Payload

{catalog_name}.{schema_name}.{model_name}_payload

Raw JSON payloads

Payload request logs

{catalog_name}.{schema_name}.{model_name}_payload_request_logs

Formatted request and responses, MLflow traces

Payload assessment logs

{catalog_name}.{schema_name}.{model_name}_payload_assessment_logs

Formatted feedback, as provided in the Review App, for each request

Request log and assessment log tables

Two additional tables are generated automatically from the above payload inference tables: request logs and assessment logs. Users can expect the data to be in these tables within an hour of interacting with their deployment.

The following shows the schema for the request logs table.

Column name

Type

Description

client_request_id

String

Client request ID, usually null.

databricks_request_id

String

Databricks request ID.

date

Date

Date of request.

timestamp_ms

Long

Timestamp in milliseconds.

timestamp

Timestamp

Timestamp of the request.

status_code

Integer

Status code of endpoint.

execution_time_ms

Long

Total execution milliseconds.

conversation_id

String

Conversation id extracted from request logs.

request

String

The last user query from the user’s conversation. This is extracted from the RAG request.

response

String

The last response to the user. This is extracted from the RAG request.

request_raw

String

String representation of request.

response_raw

String

String representation of response.

trace

String

String representation of trace extracted from the databricks_options of response struct.

sampling_fraction

Double

Sampling fraction.

request_metadata

Map[String, String]

A map of metadata related to the model serving endpoint associated with the request. This map contains the endpoint name, model name, and model version used for your endpoint.

schema_version

String

Integer for the schema version.

The following is the schema for assessment logs.

Column name

Type

Description

request_id

String

Databricks request ID.

step_id

String

Derived from retrieval assessment.

source

Struct

A struct field containing the information on who created the assessment.

timestamp

Timestamp

Timestamp of request.

text_assessment

Struct

A struct field containing the data for any feedback on the agent’s responses from the review app.

retrieval_assessment

Struct

A struct field containing the data for any feedback on the documents retrieved for a response.

Get deployed applications

The following shows how to get your deployed agents.

from databricks.agents import list_deployments, get_deployments

# Get the deployment for specific model_fqn and version
deployment = get_deployments(model_name=model_fqn, model_version=model_version.version)

deployments = list_deployments()
# Print all the current deployments
deployments