Create a monitor using the Databricks UI

Preview

This feature is in Public Preview.

This article demonstrates create a data monitor using the Databricks UI. You can also use the API.

To access the Databricks UI, do the following:

  1. In the workspace left sidebar, click Catalog icon to open Catalog Explorer.

  2. Navigate to the table you want to monitor.

  3. Click the Quality tab.

  4. Click the Get started button.

  5. In Create monitor, choose the options you want to set up the monitor.

Profiling

From the Profile type menu, select the type of monitor you want to create. The profile types are shown in the table.

Profile type

Description

Snapshot profile

Any Delta managed table, external table, view, materialized view, or streaming table. [1]

Time series profile

A table containing values measured over time. This table includes a timestamp column.

Inference profile

A table containing predicted values output by a machine learning classification or regression model. This table includes a timestamp, a model id, model inputs (features), a column containing model predictions, and optional columns containing unique observation IDs and ground truth labels. It can also contain metadata, such as demographic information, that is not used as input to the model but might be useful for fairness and bias investigations or other monitoring.

[1] Monitors defined on materialized views and streaming tables do not support incremental processing.

If you select TimeSeries or Inference, additional parameters are required and are described in the following sections.

TimeSeries profile

For a TimeSeries profile, you must make the following selections:

  • Specify the Metric granularities that determine how to partition the data in windows across time.

  • Specify the Timestamp column, the column in the table that contains the timestamp. The timestamp column data type must be either TIMESTAMP or a type that can be converted to timestamps using the to_timestamp PySpark function.

Inference profile

For a Inference profile, in addition to the granularities and the timestamp, you must make the following selections:

  • Select the Problem type, either classification or regression.

  • Specify the Prediction column, the column containing the model’s predicted values.

  • Optionally specify the Label column, the column containing the ground truth for model predictions.

  • Specify the Model ID column, the column containing the id of the model used for prediction.

Schedule

To set up a monitor to run on a scheduled basis, select Refresh on schedule and select the frequency and time for the monitor to run. If you do not want the monitor to run automatically, select Refresh manually. If you select Refresh manually, you can later refresh the metrics from the Quality tab.

General

In the General section, you need to specify one required setting and some additional configuration options:

  • You must specify the Unity Catalog schema where the metric tables created by the monitor are stored. The location must be in the format {catalog}.{schema}.

You can also specify the following settings:

  • Assets directory. Enter the absolute path to the existing directory to store monitoring assets. By default, assets are stored in the default directory: “/Users/{user_name}/databricks_lakehouse_monitoring/{table_name}”. If you enter a different location in this field, assets are created under “/{table_name}” in the directory you specify. This directory can be anywhere in the workspace. For monitors intended to be shared within an organization, you can use a path in the “/Shared/” directory.

    This field cannot be left blank.

  • Unity Catalog baseline table name. Name of a table or view that contains baseline data for comparison. For more information about baseline tables, see Primary input table and baseline table.

  • Metric slicing expressions. Slicing expressions let you define subsets of the table to monitor in addition to the table as a whole. To create a slicing expression, click Add expression and enter the expression definition. For example the expression "col_2 > 10" generates two slices: one for col_2 > 10 and one for col_2 <= 10. As another example, the expression "col_1" will generate one slice for each unique value in col_1. The data is grouped by each expression independently, resulting in a separate slice for each predicate and its complements.

  • Custom metrics. Custom metrics appear in the metric tables like any built-in metric. For details, see Use custom metrics with Databricks Lakehouse Monitoring. To configure a custom metric, click Add custom metric. - Enter a Name for the custom metric. - Select the custom metric Type, one of Aggregate, Derived, or Drift. For definitions, see Types of custom metrics. - From the drop-down list in Input columns, select the columns to apply the metric to. - In the Output type field, select the Spark data type of the metric. - In the Definition field, enter SQL code that defines the custom metric.

Edit monitor settings in the UI

After you have created a monitor, you can make changes to the monitor’s settings by clicking the Edit monitor configuration button on the Quality tab.

Refresh and view monitor results in the UI

To run the monitor manually, click Refresh metrics.

For information about the statistics that are stored in monitor metric tables, see Monitor metric tables. Metric tables are Unity Catalog tables. You can query them in notebooks or in the SQL query explorer, and view them in Catalog Explorer.

Control access to monitor outputs

The metric tables and dashboard created by a monitor are owned by the user who created the monitor. You can use Unity Catalog privileges to control access to metric tables. To share dashboards within a workspace, click the Share button on the upper-right side of the dashboard.

Delete a monitor from the UI

To delete a monitor from the UI, click the kebab menu next to the Refresh metrics button and select Delete monitor.