Monitor apps deployed outside of Databricks
This feature is in Beta.
This page describes how to set up monitoring for generative AI apps deployed outside Mosaic AI Agent Framework. For general information on using monitoring, such as the results schema, viewing results, using the UI, adding alerts, and managing monitors, see Monitor generative AI apps.
Lakehouse Monitoring for gen AI helps you track operational metrics like volume, latency, errors, and cost, as well as quality metrics like correctness and guideline adherence, using Mosaic AI Agent Evaluation AI judges.
How monitoring works:
The monitoring UI:
Requirements
- To enable and configure monitoring, you must use the databricks-agents SDK.
%pip install databricks-agents>=0.18.1 mlflow>=2.21.2
dbutils.library.restartPython()
You do not need to have databricks-agents
installed in your production application. Your production application only needs mlflow
installed, which enables MLflow Tracing instrumentation.
Set up monitoring
If you have an AI app deployed outside Databricks or are using Databricks Apps, use the create_external_monitor
method inside a Databricks notebook to setup the monitor.
The monitor is created inside an MLflow experiment. The monitoring UI appears as a tab in the MLflow experiment. Access to log traces is controlled through the MLflow experiment's ACLs.
You will then instrument your deployed code using MLFlow tracing and mlflow.tracing.set_destination as described below. For apps deployed using Agent Framework, see Deploy an agent for generative AI applications.
The steps below assume you are working from a Databricks notebook. To create a monitor from your local development environment (for example, your IDE), download and use this Python script instead.
This notebook includes the steps shown in the rest of this page.
Create an external monitor example notebook
Step 1: Create an MLflow experiment
You can either use an existing experiment or create a new experiment. To create a new experiment, follow the steps below.
import mlflow
mlflow_experiment_path = "/Users/your-user-name@company.com/my_monitor_name"
mlflow.set_experiment(experiment_name=mlflow_experiment_path)
# Get the experiment ID to use in the next step
experiment_id = mlflow.tracking.fluent._get_experiment_id()
Step 2: Create an external monitor
By default, if you run create_external_monitor
from a Databricks notebook without explicitly specifying an experiment, your monitor is created in the notebook's MLflow experiment.
create_external_monitor
takes the following inputs:
catalog_name: str
- The name of the catalog to write delta artifacts to.schema_name: str
- The name of the schema to write the delta artifacts to. This schema should be part of the catalog above.[Optional] experiment_id
: The MLflowexperiment_id
where production traces are stored. One of this orexperiment_name
should be defined. If not specified, the monitor uses the experiment of the notebook where this command was run.[Optional] experiment_name
: The MLflowexperiment_name
where production traces are stored. One of this orexperiment_id
should be defined. If not specified, the monitor uses the experiment of the notebook where this command was run.assessments_config: AssessmentsSuiteConfig | dict
- Configuration for assessments computed by the monitor. The following parameters are supported:[Optional] sample: float
- The fraction of requests to compute assessments over (between 0 and 1). Defaults to 1.0 (compute assessments for all traffic).[Optional] paused: str
- EitherPAUSED
orUNPAUSED
.[Optional] assessments: list[BuiltinJudge | GuidelinesJudge]
A list of assessments that either are a built-in judge or the guidelines judge.
BuiltinJudge
takes the following arguments:
type: str
One of the built in judges supported in monitoring: "safety", "groundedness", "relevance_to_query", "chunk_relevance". For more details on the built in judges, see Built In Judges.
GuidelinesJudge
takes the following arguments:
guidelines: dict[str, list[str]]
A dict containing guideline names and plain-text guidelines that are used to assert over the request / response. For more details on guidelines, see Guideline Adherence.
For more details, see the Python SDK documentation.
For example:
from databricks.agents.monitoring import create_external_monitor, AssessmentsSuiteConfig, BuiltinJudge, GuidelinesJudge
import mlflow
external_monitor = create_external_monitor(
catalog_name='my_catalog',
schema_name='my_schema',
# experiment_id=..., # Replace this line with your MLflow experiment ID. By default, create_external_monitor uses the notebook's MLflow experiment.
assessments_config=AssessmentsSuiteConfig(
sample=1.0,
assessments=[
# Builtin judges: "safety", "groundedness", "relevance_to_query", "chunk_relevance"
BuiltinJudge(name='safety'), # or {'name': 'safety'}
BuiltinJudge(name='groundedness'), # or {'name': 'groundedness'}
BuiltinJudge(name='relevance_to_query'), # or {'name': 'relevance_to_query'}
BuiltinJudge(name='chunk_relevance'), # or {'name': 'chunk_relevance'}
# Create custom judges with the guidelines judge.
GuidelinesJudge(guidelines={
"pii": ["The response must not contain personal information."],
"english": ["The response must be in English"]
}),
]
)
# AssessmentsSuiteConfig can also be simple objects:
# assessments_config={
# "sample": 1.0,
# "assessments": [
# {'name': 'safety'},
# {'name': 'groundedness'},
# {'name': 'relevance_to_query'},
# {'name': 'chunk_relevance'},
# {
# 'name': 'guideline_adherence',
# 'guidelines': {
# 'pii': ['The response must not contain personal information.'],
# 'english': ['The response must be in English']
# }
# },
# ]
# }
)
print("experiment_id=", external_monitor.experiment_id)
You will see a link to the monitoring UI in the cell output. The evaluation results can be viewed in this UI, and are stored in the monitoring_table
. To view evaluated rows, run:
display(spark.table("cat.schema.monitor_table"))
Step 3: Instrument your gen AI app with MLFlow tracing
Install the following package in your deployed agent to get started:
%pip install "mlflow>=2.21.2"
The DATABRICKS_TOKEN must be for a service principal or user who has EDIT access to the MLflow experiment where the monitor is configured.
In your gen AI app, add the following:
- Set the
DATABRICKS_HOST
andDATABRICKS_TOKEN
environment variables.DATABRICKS_HOST
is your workspace's URL e.g.,https://workspace-url.databricks.com
DATABRICKS_TOKEN
is a PAT token. Follow these steps.- If you want to use a service principal's PAT token, make sure to grant the service principal EDIT writes to the MLflow experiment you configured at the top of the notebook. Without this, MLflow Tracing will NOT be able to log traces.
- mlflow.tracing.set_destination to set the trace destination.
- MLFlow Automatic Tracing or MLFlow fluent APIs to trace your app. MLFlow supports autologging for many popular frameworks, like LangChain, Bedrock, DSPy, OpenAI, and more.
- Databricks authentication tokens so MLFlow can log traces to Databricks.
# Environment variables:
DATABRICKS_TOKEN="..."
DATABRICKS_HOST="..."
import mlflow
from mlflow.tracing.destination import Databricks
# Setup the destination.
mlflow_experiment_id = "..." # This is the experiment_id that is configured in step 2.
mlflow.tracing.set_destination(Databricks(experiment_id=mlflow_experiment_id))
# Your AI app code, instrumented with mlflow.
# MLFlow supports autologging for a variety of
## Option 1: MLFlow autologging
import mlflow
mlflow.langchain.autolog()
# Enable other optional logging
# mlflow.langchain.autolog(log_models=True, log_input_examples=True)
# Your LangChain model code here
# ...
# ...
# Option 2: MLflow Fluent APIs:
# Initialize OpenAI client
# This example uses the databricks-sdk's convenience method to get an OpenAI client
# In your app, you can use any OpenAI client (or other SDKs)
w = WorkspaceClient()
openai_client = w.serving_endpoints.get_open_ai_client()
# These traces will automatically be sent to Databricks.
@mlflow.trace(span_type='AGENT')
def openai_agent(user_input: str):
return openai_client.chat.completions.create(
model="databricks-meta-llama-3-3-70b-instruct",
messages=[
{
"role": "system",
"content": "You are a helpful assistant that always responds in CAPS!",
},
{"role": "user", "content": user_input},
],
)
# Call the app to generate a Trace
openai_agent("What is GenAI observability?")
Step 4. Visit the monitoring UI to view the logged trace
Go to the MLflow experiment you configured in step 2. Click on the monitoring tab and then on "Logs" at the top left to see the logged trace from step 3.
Monitor execution and scheduling
When you create a monitor, it initiates a job that evaluates a sample of requests to your endpoint from the last 30 days. This initial evaluation may take several minutes to complete, depending on the volume of requests and the sampling rate.
After the initial evaluation, the monitor automatically refreshes every 15 minutes to evaluate new requests. There is a delay between when requests are made to the endpoint and when they are evaluated by the monitor.
Monitors are backed by Databricks Workflows. To manually trigger a refresh of a monitor, find the workflow with name [<endpoint_name>] Agent Monitoring Job
and click Run now.
Limitations
The following are limitations of Lakehouse Monitoring for GenAI apps deployed outside of Databricks: