Agent Monitoring Demo Notebook

This notebook demonstrates how to monitor a deployed agent using Mosaic AI Agent Evaluation. It will:

Create a deploy of a simple LLM model wrapper.
Send sample traffic to the deployed endpoint.
Monitor the deployed agent using Mosaic AI Agent Evaluation.

Note: When you deploy agents authored with ChatAgent or ChatModel using agents.deploy, basic monitoring is automatically set up. This includes request volume, latency metrics, and error logging.

2:

Install dependencies

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. langchain 0.1.20 requires tenacity<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. langchain-community 0.0.38 requires tenacity<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. langchain-core 0.1.52 requires tenacity<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. ydata-profiling 4.5.1 requires pydantic<2,>=1.8.1, but you have pydantic 2.10.6 which is incompatible. Note: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.

4

6

/local_disk0/.ephemeral_nfs/envs/pythonEnv-fedbeb41-d73c-4fb3-9efb-a58712d65084/lib/python3.11/site-packages/mlflow/pyfunc/utils/data_validation.py:168: UserWarning: Add type hints to the `predict` method to enable data validation and automatic signature inference during model logging. Check https://mlflow.org/docs/latest/model/python_model.html#type-hint-usage-in-pythonmodel for more details. color_warning(

7

Overwriting simple_agent.py

8

2025/03/03 20:06:39 INFO mlflow.pyfunc: Predicting on input example to validate output

9

{'choices': [{'message': {'role': 'assistant', 'content': '**MLflow** is an open-source platform for managing the end-to-end machine learning lifecycle. It provides a standardized framework for tracking, reproducing, and deploying machine learning models.\n\n### Key Features of MLflow\n\n1. **Model Management**: MLflow allows you to manage and track different versions of your models, including hyperparameters, metrics, and artifacts.\n2. **Experiment Tracking**: MLflow provides a centralized repository for tracking experiments, including parameters, metrics, and artifacts.\n3. **Model Serving**: MLflow allows you to deploy models as RESTful APIs, making it easy to integrate them into production environments.\n4. **Reproducibility**: MLflow enables reproducibility by tracking all aspects of the machine learning workflow, including data, code, and environment.\n\n### Components of MLflow\n\n1. **MLflow Tracking**: provides APIs for logging and tracking experiments, models, and metrics.\n2. **MLflow Projects**: provides a standardized way to manage and deploy machine learning projects.\n3. **MLflow Models**: provides a standardized way to manage and deploy machine learning models.\n4. **MLflow Registry**: provides a centralized repository for managing and versioning machine learning models.\n\n### Benefits of Using MLflow\n\n1. **Improved Collaboration**: MLflow enables data scientists and engineers to collaborate more effectively by providing a standardized framework for managing machine learning workflows.\n2. **Increased Transparency**: MLflow provides a transparent and reproducible way to manage machine learning workflows, making it easier to track and understand model performance.\n3. **Faster Deployment**: MLflow enables faster deployment of machine learning models by providing a standardized way to manage and deploy models.\n4. **Better Model Management**: MLflow provides a centralized repository for managing machine learning models, making it easier to track and manage different versions of models.'}, 'index': 0, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 15, 'completion_tokens': 365, 'total_tokens': 380}, 'id': 'chatcmpl_bd89ca04-49f0-427f-9cc0-60356b03cc47', 'model': 'meta-llama-3.3-70b-instruct-121024', 'object': 'chat.completion', 'created': 1741032416}

Trace(request_id=tr-973d1efc17114c7da0eb173483a067bd)

10

Successfully registered model 'ml.avesh.monitoring_demo_agent_march_3_2'.

Created version '1' of model 'ml.avesh.monitoring_demo_agent_march_3_2'.

/local_disk0/.ephemeral_nfs/envs/pythonEnv-fedbeb41-d73c-4fb3-9efb-a58712d65084/lib/python3.11/site-packages/mlflow/pyfunc/utils/data_validation.py:168: UserWarning: Add type hints to the `predict` method to enable data validation and automatic signature inference during model logging. Check https://mlflow.org/docs/latest/model/python_model.html#type-hint-usage-in-pythonmodel for more details. color_warning( Created monitor for endpoint "agents_ml-avesh-monitoring_demo_agent_march_3_2". View monitoring page: https://e2-dogfood.staging.cloud.databricks.com/ml/experiments/2154539344714143/evaluation-monitoring?endpointName=agents_ml-avesh-monitoring_demo_agent_march_3_2 No computed metrics specified. To override the computed metrics, include `metrics` in the monitoring_config. Deployment of ml.avesh.monitoring_demo_agent_march_3_2 version 1 initiated. This can take up to 15 minutes and the Review App & Query Endpoint will not work until this deployment finishes. View status: https://e2-dogfood.staging.cloud.databricks.com/ml/endpoints/agents_ml-avesh-monitoring_demo_agent_march_3_2 Review App: https://e2-dogfood.staging.cloud.databricks.com/ml/review/ml.avesh.monitoring_demo_agent_march_3_2/1?o=6051921418418893 Monitor: https://e2-dogfood.staging.cloud.databricks.com/ml/experiments/2154539344714143/evaluation-monitoring?endpointName=agents_ml-avesh-monitoring_demo_agent_march_3_2

11

Waiting for endpoint to deploy. This can take 10 - 20 minutes.................

13

Monitor URL: https://e2-dogfood.staging.cloud.databricks.com/ml/experiments/2154539344714143/evaluation-monitoring?endpointName=agents_ml-avesh-monitoring_demo_agent_march_3_2 Current monitor configuration: MonitoringConfig(sample=0.1, metrics=[], periodic=None, paused=False, global_guidelines=None)

14

Updated monitor for endpoint "agents_ml-avesh-monitoring_demo_agent_march_3_2". View monitoring page: https://e2-dogfood.staging.cloud.databricks.com/ml/experiments/2154539344714143/evaluation-monitoring?endpointName=agents_ml-avesh-monitoring_demo_agent_march_3_2

16

Question 1: What is Mosaic AI Agent Evaluation? Question 2: How do you use MLflow with Databricks for experiment tracking? Question 3: What should I use Databricks Feature Store for? Question 4: How does AutoML work in Databricks? Question 5: What is Model Serving in Databricks and what are its deployment options? Question 6: How does Databricks handle distributed deep learning training? Question 7: Does Unity Catalog support models? Question 8: What is the Databricks Lakehouse? Question 9: Which Llama models are supported on Databricks? Question 10: How does Databricks integrate with popular ML frameworks like PyTorch and TensorFlow?

18

20

Table

22

Command skipped

23

Command skipped

agent-monitoring-example

Agent Monitoring Demo Notebook

Configuration

Agent creation and deployment

Configuring Additional Monitoring Metrics

Generate Sample Traffic

[Optional] Create an evaluation instance

Viewing Monitoring Results

Cleanup