Author AI agents in code

This page shows how to author an AI agent in Python using Mosaic AI Agent Framework and popular agent-authoring libraries like LangGraph, PyFunc, and OpenAI.

Requirements

tip

Databricks recommends installing the latest version of the MLflow Python client when developing agents.

To author and deploy agents using the approach on this page, install the following:

databricks-agents 0.16.0 or above
mlflow 2.20.2 or above
Python 3.10 or above.
- To meet this requirement, you can use serverless compute or Databricks Runtime 13.3 LTS or above.

%pip install -U -qqqq databricks-agents>=0.16.0 mlflow>=2.20.2

Databricks also recommends installing Databricks AI Bridge integration packages when authoring agents. These integration packages (such as databricks-langchain, databricks-openai) provide a shared layer of APIs to interact with Databricks AI features, such as Databricks AI/BI Genie and Vector Search, across agent authoring frameworks and SDKs.

LangChain/LangGraph
OpenAI
Pure Python agents

%pip install -U -qqqq databricks-langchain

%pip install -U -qqqq databricks-openai

%pip install -U -qqqq databricks-ai-bridge

Use `ChatAgent` to author agents

Databricks recommends the MLflow ChatAgent interface to author production-grade agents. This chat schema specification is similar to, but not strictly compatible with, the OpenAI ChatCompletion schema.

ChatAgent easily wraps existing agents for Databricks compatibility.

ChatAgent provides the following benefits:

Advanced agent capabilities
- Multi-agent support
- Streaming output: Enable interactive user experiences by streaming output in smaller chunks.
- Comprehensive tool-calling message history: Return multiple messages, including intermediate tool-calling messages, for improved quality and conversation management.
- Tool-calling confirmation support
Streamlined development, deployment, and monitoring
- Author your agent using any framework: Wrap any existing agent using the ChatAgent interface to get out-of-the-box compatibility with AI Playground, Agent Evaluation, and Agent Monitoring.
- Typed authoring interfaces: Write agent code using typed Python classes, benefiting from IDE and notebook autocomplete.
- Automatic signature inference: MLflow automatically infers ChatAgent signatures when logging the agent, simplifying registration and deployment. See Infer Model Signature during logging.
- AI Gateway-enhanced inference tables: AI Gateway inference tables are automatically enabled for deployed agents, providing access to detailed request log metadata.

To learn how to create a ChatAgent, see the examples in the following section and MLflow documentation - What is the ChatAgent interface.

What if I already have an agent?

If you already have an agent built with LangChain, LangGraph, or a similar framework, you don’t need to rewrite your agent to use it on Databricks. Instead, just wrap your existing agent with the MLflow ChatAgent interface:

Write a Python wrapper class that inherits from mlflow.pyfunc.ChatAgent.

Inside the wrapper class, keep your existing agent as an attribute self.agent = your_existing_agent.

TheChatAgent class requires you to implement a predict method to handle non-streaming requests.

predict must accept:

messages: list[ChatAgentMessage], which is a list of ChatAgentMessage each with a role (like "user" or "assistant"), the prompt, and an ID.
(Optional) context: Optional[ChatContext] and custom_inputs: Optional[dict] for extra data.

Python
import uuid

# input example
[
  ChatAgentMessage(
    id=str(uuid.uuid4()),  # Generate a unique ID for each message
    role="user",
    content="What's the weather in Paris?"
  )
]

predict must return a ChatAgentResponse.

Python
import uuid

# output example
ChatAgentResponse(
  messages=[
    ChatAgentMessage(
      id=str(uuid.uuid4()),  # Generate a unique ID for each message
      role="assistant",
      content="It's sunny in Paris."
    )
  ]
)

Convert between formats

In predict, convert the incoming messages from list[ChatAgentMessage] into the input format your agent expects.

After your agent generates a response, convert its output to one or more ChatAgentMessage objects and wrap them in a ChatAgentResponse.

tip

Convert LangChain output automatically

If you are wrapping a LangChain agent, you can use mlflow.langchain.output_parsers.ChatAgentOutputParser to automatically convert LangChain outputs into the MLflow ChatAgentMessage and ChatAgentResponse schema.

The following is a simplified template for converting your agent:

Python
from mlflow.pyfunc import ChatAgent
from mlflow.types.agent import ChatAgentMessage, ChatAgentResponse, ChatAgentChunk
import uuid


class MyWrappedAgent(ChatAgent):
  def __init__(self, agent):
    self.agent = agent

  def predict(self, messages, context=None, custom_inputs=None):
    # Convert messages to your agent's format
    agent_input = ... # build from messages
    agent_output = self.agent.invoke(agent_input)
    # Convert output to ChatAgentMessage
    return ChatAgentResponse(
      messages=[ChatAgentMessage(role="assistant", content=agent_output, id=str(uuid.uuid4()),)]
    )

  def predict_stream(self, messages, context=None, custom_inputs=None):
    # If your agent supports streaming
    for chunk in self.agent.stream(...):
      yield ChatAgentChunk(delta=ChatAgentMessage(role="assistant", content=chunk, id=str(uuid.uuid4())))

For complete examples, see the notebooks in the following section.

`ChatAgent` examples

The following notebooks show how to author streaming and non-streaming ChatAgents using the popular libraries OpenAI, LangGraph, and AutoGen.

LangGraph
OpenAI
AutoGen
DSPy

LangGraph tool-calling agent

Open notebook in new tab

To learn how to expand the capabilities of these agents by adding tools, see AI agent tools.

Multi-agent example

To learn how to create a multi-agent system using Genie, see Use Genie in multi-agent systems.

Streaming output agents

Streaming agents deliver responses in a continuous stream of smaller, incremental chunks. Streaming reduces perceived latency and improves user experience for conversational agents.

To author a streaming ChatAgent, define a predict_stream method that returns a generator that yields ChatAgentChunk objects - each ChatAgentChunk contains a portion of the response. Read more on ideal ChatAgent streaming behavior in the MLflow docs.

The following code shows an example predict_stream function, for complete examples of streaming agents, see ChatAgent examples:

Python
def predict_stream(
  self,
  messages: list[ChatAgentMessage],
  context: Optional[ChatContext] = None,
  custom_inputs: Optional[dict[str, Any]] = None,
) -> Generator[ChatAgentChunk, None, None]:
  # Convert messages to a format suitable for your agent
  request = {"messages": self._convert_messages_to_dict(messages)}

  # Stream the response from your agent
  for event in self.agent.stream(request, stream_mode="updates"):
    for node_data in event.values():
      # Yield each chunk of the response
      yield from (
        ChatAgentChunk(**{"delta": msg}) for msg in node_data["messages"]
      )

Author deployment-ready `ChatAgent`s for Databricks Model Serving

Databricks deploys ChatAgents in a distributed environment on Databricks Model Serving, which means that during a multi-turn conversation, the same serving replica may not handle all requests. Pay attention to the following implications for managing agent state:

Avoid local caching: When deploying a ChatAgent, don't assume the same replica will handle all requests in a multi-turn conversation. Reconstruct internal state using a dictionary ChatAgentRequest schema for each turn.
Thread-safe state: Design agent state to be thread-safe, preventing conflicts in multi-threaded environments.
Initialize state in the predict function: Initialize state each time the predict function is called, not during ChatAgent initialization. Storing state at the ChatAgent level could leak information between conversations and cause conflicts because a single ChatAgent replica could handle requests from multiple conversations.

Custom inputs and outputs

Some scenarios may require additional agent inputs, such as client_type and session_id, or outputs like retrieval source links that should not be included in the chat history for future interactions.

For these scenarios, MLflow ChatAgent natively supports the fields custom_inputs and custom_outputs.

warning

The Agent Evaluation review app does not currently support rendering traces for agents with additional input fields.

See the following notebooks to learn how to set custom inputs and outputs.

OpenAI + PyFunc custom schema agent notebook

Open notebook in new tab

LangGraph custom schema agent notebook

Open notebook in new tab

Provide `custom_inputs` in the AI Playground and agent review app

If your agent accepts additional inputs using the custom_inputs field, you can manually provide these inputs in both the AI Playground and the agent review app.

In either the AI Playground or the Agent Review App, select the gear icon .
Enable custom_inputs.
Provide a JSON object that matches your agent's defined input schema.

Specify custom retriever schemas

AI agents commonly use retrievers to find and query unstructured data from vector search indices. For example retriever tools, see Build and trace retriever tools for unstructured data.

Trace these retrievers within your agent with MLflow RETRIEVER spans to enable Databricks product features, including:

Automatically displaying links to retrieved source documents in the AI Playground UI
Automatically running retrieval groundedness and relevance judges in Agent Evaluation

note

Databricks recommends using retriever tools provided by Databricks AI Bridge packages like databricks_langchain.VectorSearchRetrieverTool and databricks_openai.VectorSearchRetrieverTool because they already conform to the MLflow retriever schema. See Locally develop Vector Search retriever tools with AI Bridge.

If your agent includes retriever spans with a custom schema, call mlflow.models.set_retriever_schema when you define your agent in code. This maps your retriever's output columns to MLflow's expected fields (primary_key, text_column, doc_uri).

Python
import mlflow
# Define the retriever's schema by providing your column names
# For example, the following call specifies the schema of a retriever that returns a list of objects like
# [
#     {
#         'document_id': '9a8292da3a9d4005a988bf0bfdd0024c',
#         'chunk_text': 'MLflow is an open-source platform, purpose-built to assist machine learning practitioners...',
#         'doc_uri': 'https://mlflow.org/docs/latest/index.html',
#         'title': 'MLflow: A Tool for Managing the Machine Learning Lifecycle'
#     },
#     {
#         'document_id': '7537fe93c97f4fdb9867412e9c1f9e5b',
#         'chunk_text': 'A great way to get started with MLflow is to use the autologging feature. Autologging automatically logs your model...',
#         'doc_uri': 'https://mlflow.org/docs/latest/getting-started/',
#         'title': 'Getting Started with MLflow'
#     },
# ...
# ]
mlflow.models.set_retriever_schema(
    # Specify the name of your retriever span
    name="mlflow_docs_vector_search",
    # Specify the output column name to treat as the primary key (ID) of each retrieved document
    primary_key="document_id",
    # Specify the output column name to treat as the text content (page content) of each retrieved document
    text_column="chunk_text",
    # Specify the output column name to treat as the document URI of each retrieved document
    doc_uri="doc_uri",
    # Specify any other columns returned by the retriever
    other_columns=["title"],
)

note

The doc_uri column is especially important when evaluating the retriever's performance. doc_uri is the main identifier for documents returned by the retriever, allowing you to compare them against ground truth evaluation sets. See Evaluation sets (MLflow 2).

Parametrize agent code for deployment across environments

You can parametrize agent code to reuse the same agent code across different environments.

Parameters are key-value pairs that you define in a Python dictionary or a .yaml file.

To configure the code, create a ModelConfig using either a Python dictionary or a .yaml file. ModelConfig is a set of key-value parameters that allows for flexible configuration management. For example, you can use a dictionary during development and then convert it to a .yaml file for production deployment and CI/CD.

For details about ModelConfig, see the MLflow documentation.

An example ModelConfig is shown below:

YAML
llm_parameters:
  max_tokens: 500
  temperature: 0.01
model_serving_endpoint: databricks-meta-llama-3-3-70b-instruct
vector_search_index: ml.docs.databricks_docs_index
prompt_template: 'You are a hello world bot. Respond with a reply to the user''s
  question that indicates your prompt template came from a YAML file. Your response
  must use the word "YAML" somewhere. User''s question: {question}'
prompt_template_input_vars:
  - question

In your agent code, you can reference a default (development) configuration from the .yaml file or dictionary:

Python
import mlflow
# Example for loading from a .yml file
config_file = "configs/hello_world_config.yml"
model_config = mlflow.models.ModelConfig(development_config=config_file)

# Example of using a dictionary
config_dict = {
    "prompt_template": "You are a hello world bot. Respond with a reply to the user's question that is fun and interesting to the user. User's question: {question}",
    "prompt_template_input_vars": ["question"],
    "model_serving_endpoint": "databricks-meta-llama-3-3-70b-instruct",
    "llm_parameters": {"temperature": 0.01, "max_tokens": 500},
}

model_config = mlflow.models.ModelConfig(development_config=config_dict)

# Use model_config.get() to retrieve a parameter value
# You can also use model_config.to_dict() to convert the loaded config object
# into a dictionary
value = model_config.get('sample_param')

Then, when logging your agent, specify the model_config parameter to log_model to specify a custom set of parameters to use when loading the logged agent. See MLflow documentation - ModelConfig.

Streaming error propagation

Mosaic AI propagation any errors encountered while streaming with the last token under databricks_output.error. It is up to the calling client to properly handle and surface this error.

Bash
{
  "delta": …,
  "databricks_output": {
    "trace": {...},
    "error": {
      "error_code": BAD_REQUEST,
      "message": "TimeoutException: Tool XYZ failed to execute."
    }
  }
}

Use synchronous code or call-back patterns

To ensure stability and compatibility, use synchronous code or callback-based patterns in your agent implementation.

Databricks automatically manages asynchronous communication to provide optimal concurrency and performance when an agent is deployed. Introducing your own event loop or async framework may lead to errors like RuntimeError: This event loop is already running and cause unpredictable behavior.

Databricks recommends avoiding asynchronous programming, such as using asyncio or creating custom event loops, when developing agents.

Author AI agents in code

Requirements

Use `ChatAgent` to author agents

What if I already have an agent?

`ChatAgent` examples

LangGraph tool-calling agent

OpenAI tool-calling agent

OpenAI Responses API tool-calling agent

OpenAI chat-only agent

AutoGen tool-calling agent

DSPy chat-only agent

Multi-agent example

Streaming output agents

Author deployment-ready `ChatAgent`s for Databricks Model Serving

Custom inputs and outputs

OpenAI + PyFunc custom schema agent notebook

LangGraph custom schema agent notebook

Provide `custom_inputs` in the AI Playground and agent review app

Specify custom retriever schemas

Parametrize agent code for deployment across environments

Streaming error propagation

Use synchronous code or call-back patterns

Next steps

Requirements​

Use ChatAgent to author agents​

What if I already have an agent?​

ChatAgent examples​

LangGraph tool-calling agent

OpenAI tool-calling agent

OpenAI Responses API tool-calling agent

OpenAI chat-only agent

AutoGen tool-calling agent

DSPy chat-only agent

Multi-agent example​

Streaming output agents​

Author deployment-ready ChatAgents for Databricks Model Serving​

Custom inputs and outputs​

OpenAI + PyFunc custom schema agent notebook

LangGraph custom schema agent notebook

Provide custom_inputs in the AI Playground and agent review app​

Specify custom retriever schemas​

Parametrize agent code for deployment across environments​

Streaming error propagation​

Use synchronous code or call-back patterns​

Next steps​

Requirements

Use `ChatAgent` to author agents

What if I already have an agent?

`ChatAgent` examples

Multi-agent example

Streaming output agents

Author deployment-ready `ChatAgent`s for Databricks Model Serving

Custom inputs and outputs

Provide `custom_inputs` in the AI Playground and agent review app

Specify custom retriever schemas

Parametrize agent code for deployment across environments

Streaming error propagation

Use synchronous code or call-back patterns

Next steps