Skip to main content

Monitor generative AI apps

Beta

This feature is in Beta.

This page describes how to use the various features of Lakehouse Monitoring for GenAI. To enable monitoring, follow the steps linked on the monitoring overview.

View monitoring results

Before viewing monitoring results, you must have the following:

After these prerequisites are met, you can view a page summarizing the results generated by a monitor by following these steps:

  1. Click Experiments in the sidebar under the Machine Learning section.

  2. Click the MLflow experiment associated with your monitor.

    If you are not sure how to find the name of the relevant experiment, follow the instructions in Get monitor metadata to retrieve the experiment ID and run mlflow.get_experiment(experiment_id=$YOUR_EXPERIMENT_ID) in your notebook to find the experiment name.

  3. Click the Monitoring tab.

  4. Select your SQL warehouse using the Choose a SQL Warehouse dropdown.

  5. The page updates to show your monitoring results. Results can take a few minutes to load.

Use the monitoring UI

All data in the monitoring UI, in both the Charts and Logs tabs, is constrained to a window of time. To change the window, use the Time Range dropdown.

Charts tab

The Charts tab is composed of four sections: Requests, Metrics, Latency, and Errors.

Screenshot of page summarizing monitoring results.

The Requests section shows trace volume over time.

Screenshot of Requests section.

The Metrics section shows counts of responses that are evaluated by LLM judges. Green indicates responses that pass, while red denotes responses that fail. The metrics listed in this section should correspond to those defined when you created a monitor along with an overall pass/fail quality score.

Screenshot of Metrics section.

The Latency section shows the trace execution latency over time, taken from the MLflow reported latency.

Screenshot of Latency section.

The Errors section shows any model errors over time. When no errors have occurred, you will see a “no data” indicator as follows:

Screenshot of Errors section.

Logs tab

Screenshot of Logs tab.

The Logs tab lists the requests sent to the selected model, along with the results of LLM evaluations, if any. A maximum of 10,000 requests from the selected time period are shown in the UI. If the request count exceeds this threshold, requests are sampled at a rate different from the sample rate specified in the monitor configuration.

To filter request logs based on text contained in submitted requests, use the search box. You can also use the Filters dropdown menu to filter logs by the outcomes of their associated evaluations.

Screenshot of log filters.

Hover over a request and click on the checkbox to select a request. You can then click Add to evals to add these requests to an evaluation dataset.

Screenshot of add to evals modal.

Click a request to view its details. The modal displays evaluation results, the input, the response, and the document(s) retrieved to answer the request, if any. For more details of the request, including timing information, click See detailed trace view at the upper-right of the modal.

Screenshot of request detail modal.

Screenshot of detailed trace view of a request.

Add alerts

Use Databricks SQL alerts to notify users when the evaluated traces table does not match expectations, for example when the fraction of requests marked as harmful exceeds a threshold.

Update or pause a monitor

To update the configuration of a monitor, call update_monitor, which takes the following inputs:

  • endpoint_name: str - Name of the endpoint being monitored
  • monitoring_config: dict - Configuration for the monitor. See Set up monitoring for supported parameters.

For example:

Python
from databricks.agents.evals.monitors import update_monitor

monitor = update_monitor(
endpoint_name = "model-serving-endpoint-name",
monitoring_config = {
"sample": 0.1, # Change sampling rate to 10%
}
)

Similarly, to pause a monitor:

Python
from databricks.agents.evals.monitors import update_monitor

monitor = update_monitor(
endpoint_name = "model-serving-endpoint-name",
monitoring_config = {
"paused": True,
}
)

Get monitor metadata

Use the get_monitor function to retrieve the current configuration of a monitor for a deployed agent.

Python
from databricks.agents.evals.monitors import get_monitor

get_monitor('model-serving-endpoint-name')

The function returns a Monitor object including the following attributes:

  • endpoint_name - Name of the endpoint being monitored.
  • monitoring_config - Configuration for the monitor. See Set up monitoring for configuration parameters.
  • experiment_id - The MLflow experiment where the monitoring results are displayed. See View monitoring results.
  • evaluated_traces_table - Unity Catalog table containing monitoring evaluation results.

Delete a monitor

To remove a monitor from an endpoint, call delete_monitor.

Python
from databricks.agents.evals.monitors import delete_monitor

monitor = delete_monitor(
endpoint_name = "model-serving-endpoint-name",
)

The evaluated traces table generated by a monitor will not be deleted by calls to delete_monitor.