databricks-logo

agent-monitoring-example

(Python)
Loading...

Agent Monitoring Demo Notebook

This notebook demonstrates how to monitor a deployed agent using Mosaic AI Agent Evaluation. It will:

  1. Create a deploy of a simple LLM model wrapper.
  2. Send sample traffic to the deployed endpoint.
  3. Monitor the deployed agent using Mosaic AI Agent Evaluation.

Note: When you deploy agents authored with ChatAgent or ChatModel using agents.deploy, basic monitoring is automatically set up. This includes request volume, latency metrics, and error logging.

Install dependencies

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. langchain 0.1.20 requires tenacity<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. langchain-community 0.0.38 requires tenacity<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. langchain-core 0.1.52 requires tenacity<9.0.0,>=8.1.0, but you have tenacity 9.0.0 which is incompatible. ydata-profiling 4.5.1 requires pydantic<2,>=1.8.1, but you have pydantic 2.10.6 which is incompatible. Note: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.

Configuration

This notebook will store resources in the below UC catalog and schema.

4

Agent creation and deployment

In this section, we will:

  1. Create a simple agent by using Llama 70B
  2. Log the agent using MLflow
  3. Deploy the agent. This will automatically setup basic monitoring that tracks request volume, latency, and errors.

You can skip this step if you already have a deployed agent.

6

/local_disk0/.ephemeral_nfs/envs/pythonEnv-fedbeb41-d73c-4fb3-9efb-a58712d65084/lib/python3.11/site-packages/mlflow/pyfunc/utils/data_validation.py:168: UserWarning: Add type hints to the `predict` method to enable data validation and automatic signature inference during model logging. Check https://mlflow.org/docs/latest/model/python_model.html#type-hint-usage-in-pythonmodel for more details. color_warning(
7

Overwriting simple_agent.py
8

2025/03/03 20:06:39 INFO mlflow.pyfunc: Predicting on input example to validate output
9

{'choices': [{'message': {'role': 'assistant', 'content': '**MLflow** is an open-source platform for managing the end-to-end machine learning lifecycle. It provides a standardized framework for tracking, reproducing, and deploying machine learning models.\n\n### Key Features of MLflow\n\n1. **Model Management**: MLflow allows you to manage and track different versions of your models, including hyperparameters, metrics, and artifacts.\n2. **Experiment Tracking**: MLflow provides a centralized repository for tracking experiments, including parameters, metrics, and artifacts.\n3. **Model Serving**: MLflow allows you to deploy models as RESTful APIs, making it easy to integrate them into production environments.\n4. **Reproducibility**: MLflow enables reproducibility by tracking all aspects of the machine learning workflow, including data, code, and environment.\n\n### Components of MLflow\n\n1. **MLflow Tracking**: provides APIs for logging and tracking experiments, models, and metrics.\n2. **MLflow Projects**: provides a standardized way to manage and deploy machine learning projects.\n3. **MLflow Models**: provides a standardized way to manage and deploy machine learning models.\n4. **MLflow Registry**: provides a centralized repository for managing and versioning machine learning models.\n\n### Benefits of Using MLflow\n\n1. **Improved Collaboration**: MLflow enables data scientists and engineers to collaborate more effectively by providing a standardized framework for managing machine learning workflows.\n2. **Increased Transparency**: MLflow provides a transparent and reproducible way to manage machine learning workflows, making it easier to track and understand model performance.\n3. **Faster Deployment**: MLflow enables faster deployment of machine learning models by providing a standardized way to manage and deploy models.\n4. **Better Model Management**: MLflow provides a centralized repository for managing machine learning models, making it easier to track and manage different versions of models.'}, 'index': 0, 'finish_reason': 'stop'}], 'usage': {'prompt_tokens': 15, 'completion_tokens': 365, 'total_tokens': 380}, 'id': 'chatcmpl_bd89ca04-49f0-427f-9cc0-60356b03cc47', 'model': 'meta-llama-3.3-70b-instruct-121024', 'object': 'chat.completion', 'created': 1741032416}
Trace(request_id=tr-973d1efc17114c7da0eb173483a067bd)
10

Successfully registered model 'ml.avesh.monitoring_demo_agent_march_3_2'.
Created version '1' of model 'ml.avesh.monitoring_demo_agent_march_3_2'.
/local_disk0/.ephemeral_nfs/envs/pythonEnv-fedbeb41-d73c-4fb3-9efb-a58712d65084/lib/python3.11/site-packages/mlflow/pyfunc/utils/data_validation.py:168: UserWarning: Add type hints to the `predict` method to enable data validation and automatic signature inference during model logging. Check https://mlflow.org/docs/latest/model/python_model.html#type-hint-usage-in-pythonmodel for more details. color_warning( Created monitor for endpoint "agents_ml-avesh-monitoring_demo_agent_march_3_2". View monitoring page: https://e2-dogfood.staging.cloud.databricks.com/ml/experiments/2154539344714143/evaluation-monitoring?endpointName=agents_ml-avesh-monitoring_demo_agent_march_3_2 No computed metrics specified. To override the computed metrics, include `metrics` in the monitoring_config. Deployment of ml.avesh.monitoring_demo_agent_march_3_2 version 1 initiated. This can take up to 15 minutes and the Review App & Query Endpoint will not work until this deployment finishes. View status: https://e2-dogfood.staging.cloud.databricks.com/ml/endpoints/agents_ml-avesh-monitoring_demo_agent_march_3_2 Review App: https://e2-dogfood.staging.cloud.databricks.com/ml/review/ml.avesh.monitoring_demo_agent_march_3_2/1?o=6051921418418893 Monitor: https://e2-dogfood.staging.cloud.databricks.com/ml/experiments/2154539344714143/evaluation-monitoring?endpointName=agents_ml-avesh-monitoring_demo_agent_march_3_2
11

Waiting for endpoint to deploy. This can take 10 - 20 minutes.................

Configuring Additional Monitoring Metrics

Since our agent was deployed using agents.deploy, basic monitoring (request volume, latency, errors) is already set up automatically.

Now we'll add evaluation metrics to our monitoring. The monitoring configuration specified here will:

  • Sample 50% of requests for evaluation
  • Evaluate responses against safety, relevance, and custom guidelines
13

Monitor URL: https://e2-dogfood.staging.cloud.databricks.com/ml/experiments/2154539344714143/evaluation-monitoring?endpointName=agents_ml-avesh-monitoring_demo_agent_march_3_2 Current monitor configuration: MonitoringConfig(sample=0.1, metrics=[], periodic=None, paused=False, global_guidelines=None)

Generate Sample Traffic

Now that our endpoint is deployed, we'll send some sample questions to generate traffic for monitoring.

16

Question 1: What is Mosaic AI Agent Evaluation? Question 2: How do you use MLflow with Databricks for experiment tracking? Question 3: What should I use Databricks Feature Store for? Question 4: How does AutoML work in Databricks? Question 5: What is Model Serving in Databricks and what are its deployment options? Question 6: How does Databricks handle distributed deep learning training? Question 7: Does Unity Catalog support models? Question 8: What is the Databricks Lakehouse? Question 9: Which Llama models are supported on Databricks? Question 10: How does Databricks integrate with popular ML frameworks like PyTorch and TensorFlow?

[Optional] Create an evaluation instance

Traces can be exported to Managed Evaluations in order to build an evaluation set. For more information, see Managed evaluations — subject matter expert (SME) user guide.

18

Viewing Monitoring Results

The monitoring results are stored in Delta tables and can be accessed in two ways:

  1. Through the MLflow UI (click the link generated above)
  2. Directly querying the Delta table containing evaluated traces

Below, we'll query the Delta table to see the evaluation results, filtering out skipped evaluations.

If you do not see monitoring results, wait until the next run of the monitoring job.

20

Cleanup

When you're done with the demo, you can delete the monitor using the code below.

Note that this only removes the monitoring configuration - it doesn't affect the deployed model or stored evaluation results.

22

Command skipped
23

    Command skipped
    ;