Skip to main content

Author AI agents in code

This article shows how to author an AI agent in Python using Mosaic AI Agent Framework and popular agent-authoring libraries like LangGraph, PyFunc, and OpenAI.

Requirements

Databricks recommends installing the latest version of the MLflow Python client when developing agents.

To author and deploy agents using the approach in this article, you must have the following minimum package versions:

  • databricks-agents version 0.16.0 and above
  • mlflow version 2.20.2 and above
  • Python 3.10 or above. You can use serverless compute or Databricks Runtime 13.3 LTS and above to meet this requirement.
%pip install -U -qqqq databricks-agents>=0.16.0 mlflow>=2.20.2

Databricks also recommends installing Databricks AI Bridge integration packages when authoring agents. These integration packages (such as databricks-langchain, databricks-openai) provide a shared layer of APIs to interact with Databricks AI features, such as Databricks AI/BI Genie and Vector Search, across agent authoring frameworks and SDKs.

%pip install -U -qqqq databricks-langchain

Use ChatAgent to author agents

Databricks recommends using MLflow's ChatAgent interface to author production-grade agents. This chat schema specification is designed for agent scenarios and is similar to, but not strictly compatible with, the OpenAI ChatCompletion schema. ChatAgent also adds functionality for multi-turn, tool-calling agents.

Authoring your agent using ChatAgent provides the following benefits:

  • Advanced agent capabilities

    • Streaming output: Enable interactive user experiences by streaming output in smaller chunks.
    • Comprehensive tool-calling message history: Return multiple messages, including intermediate tool-calling messages, for improved quality and conversation management.
    • Tool-calling confirmation support
    • Multi-agent system support
  • Streamlined development, deployment, and monitoring

    • Framework agnostic Databricks feature integration: Write your agent in any framework of your choice and get out-of-the-box compatibility with AI Playground, Agent Evaluation, and Agent Monitoring.
    • Typed authoring interfaces: Write agent code using typed Python classes, benefiting from IDE and notebook autocomplete.
    • Automatic signature inference: MLflow automatically infers ChatAgent signatures when logging the agent, simplifying registration and deployment. See Infer Model Signature during logging.
    • AI Gateway-enhanced inference tables: AI Gateway inference tables are automatically enabled for deployed agents, providing access to detailed request log metadata.

To learn how to create a ChatAgent, see the examples in the following section and MLflow documentation - What is the ChatAgent interface.

ChatAgent examples

The following notebooks show you how to author streaming and non-streaming ChatAgents using the popular libraries OpenAI and LangGraph.

LangGraph tool-calling agent

Open notebook in new tab

OpenAI tool-calling agent

Open notebook in new tab

OpenAI Responses API tool-calling agent

Open notebook in new tab

OpenAI chat-only agent

Open notebook in new tab

To learn how to expand the capabilities of these agents by adding tools, see AI agent tools.

Multi-agent system ChatAgent example

To learn how to create a multi-agent system using Genie, see Use Genie in multi-agent systems.

Author streaming output agents

Streaming agents deliver responses in a continuous stream of smaller, incremental chunks. Streaming output allows end users to read agent output as it is generated, reducing perceived latency and improving the overall user experience for conversational agents.

To author a streaming ChatAgent, define a predict_stream method that return a generator that yields ChatAgentChunk objects, each containing a portion of the response. Read more on ideal ChatAgent streaming behavior in the MLflow docs.

The following code shows an example predict_stream function, for complete examples of streaming agents, see ChatAgent examples:

Python
def predict_stream(
self,
messages: list[ChatAgentMessage],
context: Optional[ChatContext] = None,
custom_inputs: Optional[dict[str, Any]] = None,
) -> Generator[ChatAgentChunk, None, None]:
# Convert messages to a format suitable for your agent
request = {"messages": self._convert_messages_to_dict(messages)}

# Stream the response from your agent
for event in self.agent.stream(request, stream_mode="updates"):
for node_data in event.values():
# Yield each chunk of the response
yield from (
ChatAgentChunk(**{"delta": msg}) for msg in node_data["messages"]
)

Author deployment-ready ChatAgents for Databricks Model Serving

Databricks deploys ChatAgents in a distributed environment on Databricks Model Serving, which means that during a multi-turn conversation, the same serving replica may not handle all requests. Pay attention to the following implications for managing agent state:

  • Avoid local caching: When deploying a ChatAgent, don't assume the same replica will handle all requests in a multi-turn conversation. Reconstruct internal state using a dictionary ChatAgentRequest schema for each turn.

  • Thread-safe state: Design agent state to be thread-safe, preventing conflicts in multi-threaded environments.

  • Initialize state in the predict function: Initialize state each time the predict function is called, not during ChatAgent initialization. Storing state at the ChatAgent level could leak information between conversations and cause conflicts because a single ChatAgent replica could handle requests from multiple conversations.

Custom inputs and outputs

Some scenarios may require additional agent inputs, such as client_type and session_id, or outputs like retrieval source links that should not be included in the chat history for future interactions.

For these scenarios, MLflow ChatAgent natively supports the fields custom_inputs and custom_outputs.

warning

The Agent Evaluation review app does not currently support rendering traces for agents with additional input fields.

See the following examples to learn how to set custom inputs and outputs for OpenAI/PyFunc and LangGraph agents.

OpenAI + PyFunc custom schema agent notebook

Open notebook in new tab

LangGraph custom schema agent notebook

Open notebook in new tab

Provide custom_inputs in the AI Playground and agent review app

If your agent accepts additional inputs using the custom_inputs field, you can manually provide these inputs in both the AI Playground and the agent review app.

  1. In either the AI Playground or the Agent Review App, select the gear icon Gear icon.

  2. Enable custom_inputs.

  3. Provide a JSON object that matches your agent’s defined input schema.

    Provide custom_inputs in the AI playground.

Specify custom retriever schemas

AI agents commonly use retrievers to find and query unstructured data from vector search indices. For example retriever tools, see Unstructured retrieval AI agent tools.

Trace these retrievers within your agent with MLflow RETRIEVER spans to enable Databricks product features, including:

  • Automatically displaying links to retrieved source documents in the AI Playground UI
  • Automatically running retrieval groundedness and relevance judges in Agent Evaluation
note

Databricks recommends using retriever tools provided by Databricks AI Bridge packages like databricks_langchain.VectorSearchRetrieverTool and databricks_openai.VectorSearchRetrieverTool because they already conform to the MLflow retriever schema. See Locally develop Vector Search retriever tools with AI Bridge.

If your agent includes retriever spans with a custom schema, call mlflow.models.set_retriever_schema when you define your agent in code. This maps your retriever's output columns to MLflow's expected fields (primary_key, text_column, doc_uri).

Python
import mlflow
# Define the retriever's schema by providing your column names
# For example, the following call specifies the schema of a retriever that returns a list of objects like
# [
# {
# 'document_id': '9a8292da3a9d4005a988bf0bfdd0024c',
# 'chunk_text': 'MLflow is an open-source platform, purpose-built to assist machine learning practitioners...',
# 'doc_uri': 'https://mlflow.org/docs/latest/index.html',
# 'title': 'MLflow: A Tool for Managing the Machine Learning Lifecycle'
# },
# {
# 'document_id': '7537fe93c97f4fdb9867412e9c1f9e5b',
# 'chunk_text': 'A great way to get started with MLflow is to use the autologging feature. Autologging automatically logs your model...',
# 'doc_uri': 'https://mlflow.org/docs/latest/getting-started/',
# 'title': 'Getting Started with MLflow'
# },
# ...
# ]
mlflow.models.set_retriever_schema(
# Specify the name of your retriever span
name="mlflow_docs_vector_search",
# Specify the output column name to treat as the primary key (ID) of each retrieved document
primary_key="document_id",
# Specify the output column name to treat as the text content (page content) of each retrieved document
text_column="chunk_text",
# Specify the output column name to treat as the document URI of each retrieved document
doc_uri="doc_uri",
# Specify any other columns returned by the retriever
other_columns=["title"],
)
note

The doc_uri column is especially important when evaluating the retriever’s performance. doc_uri is the main identifier for documents returned by the retriever, allowing you to compare them against ground truth evaluation sets. See Evaluation sets.

Parametrize agent code for deployment across environments

You can parametrize agent code to reuse the same agent code across different environments.

Parameters are key-value pairs that you define in a Python dictionary or a .yaml file.

To configure the code, create a ModelConfig using either a Python dictionary or a .yaml file. ModelConfig is a set of key-value parameters that allows for flexible configuration management. For example, you can use a dictionary during development and then convert it to a .yaml file for production deployment and CI/CD.

For details about ModelConfig, see the MLflow documentation.

An example ModelConfig is shown below:

YAML
llm_parameters:
max_tokens: 500
temperature: 0.01
model_serving_endpoint: databricks-dbrx-instruct
vector_search_index: ml.docs.databricks_docs_index
prompt_template: 'You are a hello world bot. Respond with a reply to the user''s
question that indicates your prompt template came from a YAML file. Your response
must use the word "YAML" somewhere. User''s question: {question}'
prompt_template_input_vars:
- question

In your agent code, you can reference a default (development) configuration from the .yaml file or dictionary:

Python
import mlflow
# Example for loading from a .yml file
config_file = "configs/hello_world_config.yml"
model_config = mlflow.models.ModelConfig(development_config=config_file)

# Example of using a dictionary
config_dict = {
"prompt_template": "You are a hello world bot. Respond with a reply to the user's question that is fun and interesting to the user. User's question: {question}",
"prompt_template_input_vars": ["question"],
"model_serving_endpoint": "databricks-dbrx-instruct",
"llm_parameters": {"temperature": 0.01, "max_tokens": 500},
}

model_config = mlflow.models.ModelConfig(development_config=config_dict)

# Use model_config.get() to retrieve a parameter value
# You can also use model_config.to_dict() to convert the loaded config object
# into a dictionary
value = model_config.get('sample_param')

Then, when logging your agent, specify the model_config parameter to log_model to specify a custom set of parameters to use when loading the logged agent. See MLflow documentation - ModelConfig.

Streaming error propagation

Mosaic AI propagation any errors encountered while streaming with the last token under databricks_output.error. It is up to the calling client to properly handle and surface this error.

Bash
{
"delta": …,
"databricks_output": {
"trace": {...},
"error": {
"error_code": BAD_REQUEST,
"message": "TimeoutException: Tool XYZ failed to execute."
}
}
}

On-behalf-of-user authentication

Beta

This feature is in Beta.

warning

While on-behalf-of user authentication is a powerful tool for enforcing secure access to sensitive data access, it enables workspace users to author agents that act on behalf of other users in Databricks. As such, it is disabled by default during beta, and must be enabled by a workspace admin. Please review the security considerations for on-behalf-of-user authentication before enabling this feature.

With on-behalf-of-user authentication, agents deployed via Mosaic AI model serving can access Databricks resources using the identity of the Databricks end user who queried the agent. This enables accessing sensitive information on a per-user basis, with fine-grained enforcement of data access control in Unity Catalog.

On-behalf-of-user authentication further restricts the incoming user token through downscoping, ensuring that the token exposed to the agent code is limited to accessing only the specific APIs defined by the agent author. This improves security by preventing unauthorized actions and reducing the risk of token misuse.

When authoring your agent, you can continue to use existing SDKs to access Databricks resources, like vector search indexes. To perform on behalf of user access to resources, there are two steps:

  1. In agent code, update SDK calls to indicate that resources should be accessed on behalf of the agent end user
  2. At agent logging time (prior to agent deployment), specify the end user REST API scopes required by your agent. For more details see On-behalf-of-user authentication

For an end-to-end example of on-behalf-of-user authentication, see End-to-end example.

The following resources are compatible with on-behalf-of-user authentication with agents.

Databricks Resource

Compatible Clients

Vector Search Index

databricks_langchain.VectorSearchRetrieverTool, databricks_openai.VectorSearchRetrieverTool or VectorSearchClient

Model Serving Endpoint

databricks.sdk.WorkspaceClient

SQL Warehouse

databricks.sdk.WorkspaceClient

UC Connections

databricks.sdk.WorkspaceClient

UC Tables and UC Functions

We currently do not support direct clients to access UC Tables or UC Functions with on-behalf-of-user authentication. Instead we encourage users to use Genie to access structured data with on-behalf-of-user authentication

Genie Space

databricks_langchain.GenieAgent or databricks_openai.GenieAgent

When initializing tools with the on behalf of user client, you can either wrap the tool initialization in a try-except block or allow errors to be exposed to the users. By handling errors, the agent can still make a best-effort response even if the end user lacks access to all the tools needed. However, if you choose not to handle errors during tool initialization, the agent will throw an error if the user is missing any required resources.

Configure SDKs

The snippets below demonstrate how to configure on behalf of end user access to different Databricks resources using various SDKs

Python
from databricks.sdk import WorkspaceClient
from databricks.sdk.credentials_provider import ModelServingUserCredentials
from databricks_langchain import VectorSearchRetrieverTool

# Configure a Databricks SDK WorkspaceClient to use on behalf of end
# user authentication
user_client = WorkspaceClient(credentials_strategy = ModelServingUserCredentials())

vector_search_tools = []
# Exclude exception handling if you want the agent to fail
# when users lack access to all required Databricks resources
try:
tool = VectorSearchRetrieverTool(
index_name="<index_name>",
description="...",
tool_name="...",
workspace_client=user_client # Specify the user authenticated client
)
vector_search_tools.append(tool)
except Exception as e:
_logger.debug("Skipping adding tool as user does not have permissions)

Initializing the Agent

on-behalf-of-user authentication is compatible with the ChatAgent interface. When using on-behalf-of-user authentication, the end user's identity is only known when your deployed agent is queried, i.e. within the predict and predict_stream functions of the ChatAgent interface. As a result, you must perform any on behalf of user access to resources (e.g. list vector search indexes that the end user has access to) from within these methods, rather than in the __init__ method of your ChatAgent implementation. This ensures that resources are isolated between invocations

Python
from mlflow.pyfunc import ChatAgent


class LangGraphChatAgent(ChatAgent):
def initialize_agent():
user_client = WorkspaceClient(
credentials_strategy=ModelServingUserCredentials()
)
system_authorized_client = WorkspaceClient()
### Use the clients above to access resources with either system or user authorization

def predict(
self,
messages: list[ChatAgentMessage],
context: Optional[ChatContext] = None,
custom_inputs: Optional[dict[str, Any]] = None,
) -> ChatAgentResponse:
agent = initialize_agent() # Initialize the Agent in Predict
request = {"messages": self._convert_messages_to_dict(messages)}

messages = []
for event in self.agent.stream(request, stream_mode="updates"):
for node_data in event.values():
messages.extend(
ChatAgentMessage(**msg) for msg in node_data.get("messages", [])
)
return ChatAgentResponse(messages=messages)

Security Considerations

There are some security implications to consider before enabling on-behalf-of-user authentication with agents:

  1. Access to Sensitive Databricks Resources: Enabling on-behalf-of-user authentication allows agents to access sensitive Databricks resources. While we've implemented API scopes to restrict the resources that developers can access and mitigate the risk of token misuse, some risks still remain. For example, the serving.serving-endpoints API scope grants an agent permission to execute a serving endpoint on behalf of the user. However, the serving endpoint itself may have access to additional API scopes that the original agent isn't authorized to use.
  2. No support for end user consent: During the current beta phase, agent users cannot view or consent to the Databricks REST API scopes required by an agent. Users are responsible for ensuring that they trust those with "Can Manage" permissions on the serving endpoint to take actions in Databricks on their behalf.

End-to-end example

The following notebook shows you how to create an agent with vector search using on-behalf-of-user authentication.

On Behalf of User Authentication with Vector Search

Open notebook in new tab

Next steps