Create a monitor using the API
Preview
This feature is in Public Preview.
This page describes how to create a monitor in Databricks and describes all of the parameters used in the API call.
You can create a monitor on any managed or external Delta table registered in Unity Catalog. Only a single monitor can be created in a Unity Catalog metastore for any table.
For details about the Lakehouse monitoring API, see the API reference.
Requirements
To use the Lakehouse monitoring API, you must install the Python client at the beginning of your notebook using the following command:
%pip install "https://ml-team-public-read.s3.amazonaws.com/wheels/data-monitoring/a4050ef7-b183-47a1-a145-e614628e3146/databricks_lakehouse_monitoring-0.3.0-py3-none-any.whl"
Profile type parameter
The profile_type
parameter determines the class of metrics that monitoring computes for the table. There are three types: Snapshot, TimeSeries, and InferenceLog. This section briefly describes the parameters. For details, see the API reference.
TimeSeries
profile
A TimeSeries
profile compares data distributions across time windows. For a TimeSeries
profile, you must provide the following:
A timestamp column (
timestamp_col
). The timestamp column data type must be eitherTIMESTAMP
or a type that can be converted to timestamps using theto_timestamp
PySpark function.The set of
granularities
over which to calculate metrics. Available granularities are “5 minutes”, “30 minutes”, “1 hour”, “1 day”, “n week(s)”, “1 month”, “1 year”.
from databricks import lakehouse_monitoring as lm
lm.create_monitor(
table_name=f"{catalog}.{schema}.{table_name}",
profile_type=lm.TimeSeries(
timestamp_col="ts",
granularities=["30 minutes"]
),
output_schema_name=f"{catalog}.{schema}"
)
Snapshot
profile
In contrast to TimeSeries
, a Snapshot
profile monitors how the full contents of the table change over time. Metrics are calculated over all data in the table, and monitor the table state at each time the monitor is refreshed.
from databricks import lakehouse_monitoring as lm
lm.create_monitor(
table_name=f"{catalog}.{schema}.{table_name}",
profile_type=lm.Snapshot(),
output_schema_name=f"{catalog}.{schema}"
)
InferenceLog
profile
An InferenceLog
profile is similar to a TimeSeries
profile but also includes model quality metrics. For an InferenceLog
profile, the following parameters are required:
Parameter |
Description |
---|---|
|
“classification” or “regression”. |
|
Column containing the model’s predicted values. |
|
Column containing the timestamp of the inference request. |
|
Column containing the id of the model used for prediction. |
|
Determines how to partition the data in windows across time. Possible values: “5 minutes”, “30 minutes”, “1 hour”, “1 day”, “n week(s)”, “1 month”, “1 year”. |
There is also an optional parameter:
Optional parameter |
Description |
---|---|
|
Column containing the ground truth for model predictions. |
from databricks import lakehouse_monitoring as lm
lm.create_monitor(
table_name=f"{catalog}.{schema}.{table_name}",
profile_type=lm.InferenceLog(
problem_type="classification",
prediction_col="preds",
timestamp_col="ts",
granularities=["30 minutes", "1 day"],
model_id_col="model_ver",
label_col="label", # optional
),
output_schema_name=f"{catalog}.{schema}"
)
For InferenceLog profiles, slices are automatically created based on the the distinct values of model_id_col
.
Refresh and view monitor results
To refresh metrics tables, use run_refresh
. For example:
from databricks import lakehouse_monitoring as lm
lm.run_refresh(
table_name=f"{catalog}.{schema}.{table_name}"
)
When you call run_refresh
from a notebook, the monitor metric tables are created or updated. This calculation runs on serverless compute, not on the cluster that the notebook is attached to. You can continue to run commands in the notebook while the monitor statistics are updated.
For information about the statistics that are stored in metric tables, see Monitor metric tables Metric tables are Unity Catalog tables. You can query them in notebooks or in the SQL query explorer, and view them in Catalog Explorer.
To display the history of all refreshes associated with a monitor, use list_refreshes
.
from databricks import lakehouse_monitoring as lm
lm.list_refreshes(
table_name=f"{catalog}.{schema}.{table_name}"
)
To get the status of a specific run that has been queued, running, or finished, use get_refresh
.
from databricks import lakehouse_monitoring as lm
run_info = lm.run_refresh(table_name=f"{catalog}.{schema}.{table_name}")
lm.get_refresh(
table_name=f"{catalog}.{schema}.{table_name}",
refresh_id = run_info.refresh_id
)
View monitor settings
You can review monitor settings using the API get_monitor
.
from databricks import lakehouse_monitoring as lm
lm.get_monitor(table_name=TABLE_NAME)
Schedule
To set up a monitor to run on a scheduled basis, use the schedule
parameter of create_monitor
:
lm.create_monitor(
table_name=f"{catalog}.{schema}.{table_name}",
profile_type=lm.TimeSeries(
timestamp_col="ts",
granularities=["30 minutes"]
),
schedule=lm.MonitorCronSchedule(
quartz_cron_expression="0 0 12 * * ?", # schedules a refresh every day at 12 noon
timezone_id="PST",
),
output_schema_name=f"{catalog}.{schema}"
)
See cron expressions for more information.
Control access to metric tables
The metric tables and dashboard created by a monitor are owned by the user who created the monitor. You can use Unity Catalog privileges to control access to metric tables. To share dashboards within a workspace, use the Share button at the upper-right of the dashboard.
Delete a monitor
To delete a monitor:
lm.delete_monitor(table_name=TABLE_NAME)
This command does not delete the profile tables and the dashboard created by the monitor. You must delete those assets in a separate step, or you can save them in a different location.
Example notebooks
The following example notebooks illustrate how to create a monitor, refresh the monitor, and examine the metric tables it creates.
Notebook example: Time series profile
This notebook illustrates how to create a TimeSeries
type monitor.