Create a monitor using the Databricks UI
This article demonstrates create a data monitor using the Databricks UI. You can also use the API.
To access the Databricks UI, do the following:
In the workspace left sidebar, click to open Catalog Explorer.
Navigate to the table you want to monitor.
Click the Quality tab.
Click the Get started button.
In Create monitor, choose the options you want to set up the monitor.
Profiling
From the Profile type menu, select the type of monitor you want to create. The profile types are shown in the table.
Profile type |
Description |
---|---|
Time series profile |
A table containing values measured over time. This table includes a timestamp column. |
Inference profile |
A table containing predicted values output by a machine learning classification or regression model. This table includes a timestamp, a model id, model inputs (features), a column containing model predictions, and optional columns containing unique observation IDs and ground truth labels. It can also contain metadata, such as demographic information, that is not used as input to the model but might be useful for fairness and bias investigations or other monitoring. |
Snapshot profile |
Any Delta managed table, external table, view, materialized view, or streaming table. |
If you select TimeSeries
or Inference
, additional parameters are required and are described in the following sections.
Note
When you first create a time series or inference profile, the monitor analyzes only data from the 30 days prior to its creation. After the monitor is created, all new data is processed.
Monitors defined on materialized views and streaming tables do not support incremental processing.
Tip
For TimeSeries
and Inference
profiles, it’s a best practice to enable change data feed (CDF) on your table. When CDF is enabled, only newly appended data is processed, rather than re-processing the entire table every refresh. This makes execution more efficient and reduces costs as you scale monitoring across many tables.
TimeSeries
profile
For a TimeSeries
profile, you must make the following selections:
Specify the Metric granularities that determine how to partition the data in windows across time.
Specify the Timestamp column, the column in the table that contains the timestamp. The timestamp column data type must be either
TIMESTAMP
or a type that can be converted to timestamps using theto_timestamp
PySpark function.
Inference
profile
For a Inference
profile, in addition to the granularities and the timestamp, you must make the following selections:
Select the Problem type, either classification or regression.
Specify the Prediction column, the column containing the model’s predicted values.
Optionally specify the Label column, the column containing the ground truth for model predictions.
Specify the Model ID column, the column containing the id of the model used for prediction.
Schedule
To set up a monitor to run on a scheduled basis, select Refresh on schedule and select the frequency and time for the monitor to run. If you do not want the monitor to run automatically, select Refresh manually. If you select Refresh manually, you can later refresh the metrics from the Quality tab.
Notifications
To set up email notifications for a monitor, enter the email to be notified and select the notifications to enable. Up to 5 emails are supported per notification event type.
General
In the General section, you need to specify one required setting and some additional configuration options:
You must specify the Unity Catalog schema where the metric tables created by the monitor are stored. The location must be in the format {catalog}.{schema}.
You can also specify the following settings:
Assets directory. Enter the absolute path to the existing directory to store monitoring assets such as the generated dashboard. By default, assets are stored in the default directory: “/Users/{user_name}/databricks_lakehouse_monitoring/{table_name}”. If you enter a different location in this field, assets are created under “/{table_name}” in the directory you specify. This directory can be anywhere in the workspace. For monitors intended to be shared within an organization, you can use a path in the “/Shared/” directory.
This field cannot be left blank.
Unity Catalog baseline table name. Name of a table or view that contains baseline data for comparison. For more information about baseline tables, see Primary input table and baseline table.
Metric slicing expressions. Slicing expressions let you define subsets of the table to monitor in addition to the table as a whole. To create a slicing expression, click Add expression and enter the expression definition. For example the expression
"col_2 > 10"
generates two slices: one forcol_2 > 10
and one forcol_2 <= 10
. As another example, the expression"col_1"
will generate one slice for each unique value incol_1
. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements.Custom metrics. Custom metrics appear in the metric tables like any built-in metric. For details, see Use custom metrics with Databricks Lakehouse Monitoring. To configure a custom metric, click Add custom metric. - Enter a Name for the custom metric. - Select the custom metric Type, one of
Aggregate
,Derived
, orDrift
. For definitions, see Types of custom metrics. - From the drop-down list in Input columns, select the columns to apply the metric to. - In the Output type field, select the Spark data type of the metric. - In the Definition field, enter SQL code that defines the custom metric.
Edit monitor settings in the UI
After you have created a monitor, you can make changes to the monitor’s settings by clicking the Edit monitor configuration button on the Quality tab.
Refresh and view monitor results in the UI
To run the monitor manually, click Refresh metrics.
For information about the statistics that are stored in monitor metric tables, see Monitor metric tables. Metric tables are Unity Catalog tables. You can query them in notebooks or in the SQL query explorer, and view them in Catalog Explorer.
Control access to monitor outputs
The metric tables and dashboard created by a monitor are owned by the user who created the monitor. You can use Unity Catalog privileges to control access to metric tables. To share dashboards within a workspace, click the Share button on the upper-right side of the dashboard.