Log and register AI agents

Preview

This feature is in Public Preview.

Log AI agents using Mosaic AI Agent Framework. Logging an agent is the basis of the development process. Logging captures a “point in time” of the agent’s code and configuration so you can evaluate the quality of the configuration.

Requirements

Create an AI agent before logging it.

Code-based vs. serialization-based logging

You can use code-based MLflow logging or serialization-based MLflow logging. Databricks recommends that you use code-based logging.

Code-based MLflow logging: The chain’s code is captured as a Python file. The Python environment is captured as a list of packages. When the chain is deployed, the Python environment is restored, and the chain’s code is executed to load the chain into memory so it can be invoked when the endpoint is called.

Serialization-based MLflow logging: The chain’s code and current state in the Python environment is serialized to disk, often using libraries such as pickle or joblib. When the chain is deployed, the Python environment is restored, and the serialized object is loaded into memory so it can be invoked when the endpoint is called.

The table shows the advantages and disadvantages of each method.

Method

Advantages

Disadvantages

Code-based MLflow logging

  • Overcomes inherent limitations of serialization, which is not supported by many popular GenAI libraries.

  • Saves a copy of the original code for later reference.

  • No need to restructure your code into a single object that can be serialized.

log_model(...) must be called from a different notebook than the chain’s code (called a driver notebook).

Serialization-based MLflow logging

log_model(...) can be called from the same notebook where the model is defined.

  • Original code is not available.

  • All libraries and objects used in the chain must support serialization.

For code-based logging, the code that logs your agent or chain must be in a separate notebook from your chain code. This notebook is called a driver notebook. For an example notebook, see Example notebooks.

Code-based logging with LangChain

  1. Create a notebook or Python file with your code. For purposes of this example, the notebook or file is named chain.py. The notebook or file must contain a LangChain chain, referred to here as lc_chain.

  2. Include mlflow.models.set_model(lc_chain) in the notebook or file.

  3. Create a new notebook to serve as the driver notebook (called driver.py in this example).

  4. In the driver notebook, use mlflow.lang_chain.log_model(lc_model=”/path/to/chain.py”)to run chain.py and log the results to an MLflow model.

  5. Deploy the model. See Deploy an agent for generative AI application. The deployment of your agent might depend on other Databricks resources such as a vector search index and model serving endpoints. For LangChain agents:

    • The MLflow log_model infers the dependencies required by the chain and logs them to the MLmodel file in the logged model artifact. Starting with Mlflow version 2.17.0, you can override these inferred dependencies. See Specify resources for PyFunc or LangChain agent.

    • During deployment, databricks.agents.deploy automatically creates the M2M OAuth tokens required to access and communicate with these inferred resource dependencies.

  6. When the serving environment is loaded, chain.py is executed.

  7. When a serving request comes in, lc_chain.invoke(...) is called.


import mlflow

code_path = "/Workspace/Users/first.last/chain.py"
config_path = "/Workspace/Users/first.last/config.yml"

input_example = {
    "messages": [
        {
            "role": "user",
            "content": "What is Retrieval-augmented Generation?",
        }
    ]
}

# example using LangChain
with mlflow.start_run():
  logged_chain_info = mlflow.langchain.log_model(
    lc_model=code_path,
    model_config=config_path, # If you specify this parameter, this is the configuration that is used for training the model. The development_config is overwritten.
    artifact_path="chain", # This string is used as the path inside the MLflow model where artifacts are stored
    input_example=input_example, # Must be a valid input to your chain
    example_no_conversion=True, # Required
  )

print(f"MLflow Run: {logged_chain_info.run_id}")
print(f"Model URI: {logged_chain_info.model_uri}")

# To verify that the model has been logged correctly, load the chain and call `invoke`:
model = mlflow.langchain.load_model(logged_chain_info.model_uri)
model.invoke(example)

Code-based logging with PyFunc

  1. Create a notebook or Python file with your code. For purposes of this example, the notebook or file is named chain.py. The notebook or file must contain a PyFunc class, referred to here as PyFuncClass.

  2. Include mlflow.models.set_model(PyFuncClass) in the notebook or file.

  3. Create a new notebook to serve as the driver notebook (called driver.py in this example).

  4. In the driver notebook, use mlflow.pyfunc.log_model(python_model=”/path/to/chain.py”, resources=”/path/to/resources.yaml”) to run chain.py and log the results to an MLflow model. The resources parameter declares any resources needed to serve the model such as a vector search index or serving endpoint that serves a foundation model. For an example resources file for PyFunc, see Specify resources for PyFunc or LangChain agent.

  5. Deploy the model. See Deploy an agent for generative AI application.

  6. When the serving environment is loaded, chain.py is executed.

  7. When a serving request comes in, PyFuncClass.predict(...) is called.

import mlflow

code_path = "/Workspace/Users/first.last/chain.py"
config_path = "/Workspace/Users/first.last/config.yml"

input_example = {
    "messages": [
        {
            "role": "user",
            "content": "What is Retrieval-augmented Generation?",
        }
    ]
}

# example using PyFunc model

resources_path = "/Workspace/Users/first.last/resources.yml"

with mlflow.start_run():
  logged_chain_info = mlflow.pyfunc.log_model(
    python_model=chain_notebook_path,
    artifact_path="chain",
    input_example=input_example,
    resources=resources_path,
    example_no_conversion=True,
  )

print(f"MLflow Run: {logged_chain_info.run_id}")
print(f"Model URI: {logged_chain_info.model_uri}")

# To verify that the model has been logged correctly, load the chain and call `invoke`:
model = mlflow.pyfunc.load_model(logged_chain_info.model_uri)
model.invoke(example)

Specify resources for PyFunc or LangChain agent

You can specify resources, such as a vector search index and a serving endpoint, that are required to serve the model.

For LangChain, resources are automatically detected and logged with the model using a best-effort approach. Starting with MLflow version 2.17.0, you can override these automatically inferred resources using code similar to that shown below. This is recommended for production use cases as it allows you to ensure that agents are logged with the necessary dependencies.

When deploying a pyfunc flavored agent, you must manually add any resource dependencies of the deployed agent. An M2M OAuth token with access to all the specified resources in the resources parameter is created and provided to the deployed agent.

Note

You can override the resources your endpoint has permission to by manually specifying the resources when logging the chain.

The following code specifies dependencies using the resources parameter.

import mlflow
from mlflow.models.resources import (
    DatabricksFunction,
    DatabricksServingEndpoint,
    DatabricksSQLWarehouse,
    DatabricksVectorSearchIndex,
)

with mlflow.start_run():
  logged_chain_info = mlflow.pyfunc.log_model(
    python_model=chain_notebook_path,
    artifact_path="chain",
    input_example=input_example,
    example_no_conversion=True,
    resources=[
      DatabricksServingEndpoint(endpoint_name="databricks-mixtral-8x7b-instruct"),
      DatabricksServingEndpoint(endpoint_name="databricks-bge-large-en"),
      DatabricksVectorSearchIndex(index_name="prod.agents.databricks_docs_index"),
      DatabricksSQLWarehouse(warehouse_id="your_warehouse_id"),
      DatabricksFunction(function_name="ml.tools.python_exec"),
    ]
  )

You can also add resources by specifying them in a resources.yaml file. You can reference that file path in the resources parameter. An M2M OAuth token with access to all the specified resources in the resources.yaml is created and provided to the deployed agent.

The following is an example resources.yaml file that defines model serving endpoints and a vector search index.

api_version: "1"
databricks:
  vector_search_index:
    - name: "catalog.schema.my_vs_index"
  serving_endpoint:
    - name: databricks-dbrx-instruct
    - name: databricks-bge-large-en

Register the chain to Unity Catalog

Before you deploy the chain, you must register the chain to Unity Catalog. When you register the chain, it is packaged as a model in Unity Catalog, and you can use Unity Catalog permissions for authorization for resources in the chain.

import mlflow

mlflow.set_registry_uri("databricks-uc")

catalog_name = "test_catalog"
schema_name = "schema"
model_name = "chain_name"

model_name = catalog_name + "." + schema_name + "." + model_name
uc_model_info = mlflow.register_model(model_uri=logged_chain_info.model_uri, name=model_name)