Monitor metric tables

Preview

This feature is in Public Preview.

This page describes the metric tables created by Databricks Lakehouse Monitoring. For information about the dashboard created by a monitor, see Use the generated SQL dashboard.

When a monitor runs on a Databricks table, it creates or updates two metric tables: a profile metrics table and a drift metrics table.

  • The profile metrics table contains summary statistics for each column and for each combination of time window, slice, and grouping columns. For InferenceLog analysis, the analysis table also contains model accuracy metrics.

  • The drift metrics table contains statistics that track changes in distribution for a metric. Drift tables can be used to visualize or alert on changes in the data instead of specific values. The following types of drift are computed:

    • Consecutive drift compares a window to the previous time window. Consecutive drift is only calculated if a consecutive time window exists after aggregation according to the specified granularities.

    • Baseline drift compares a window to the baseline distribution determined by the baseline table. Baseline drift is only calculated if a baseline table is provided.

Where metric tables are located

Monitor metric tables are saved to {output_schema}.{table_name}_profile_metrics and {output_schema}.{table_name}_drift_metrics, where:

  • {output_schema} is the catalog and schema specified by output_schema_name.

  • {table_name} is the name of the table being monitored.

How monitor statistics are computed

Each statistic and metric in the metric tables is computed for a specified time interval (called a “window”). For Snapshot analysis, the time window is a single point in time corresponding to the time the metric was refreshed. For TimeSeries and InferenceLog analysis, the time window is based on the granularities specified in create_monitor and the values in the timestamp_col specified in the profile_type argument.

Metrics are always computed for the entire table. In addition, if you provide a slicing expression, metrics are computed for each data slice defined by a value of the expression.

For example:

slicing_exprs=["col_1", "col_2 > 10"]

generates the following slices: one for col_2 > 10, one for col_2 <= 10, and one for each unique value in col1.

Slices are identified in the metrics tables by the column names slice_key and slice_value. In this example, one slice key would be “col_2 > 10” and the corresponding values would be “true” and “false”. The entire table is equivalent to slice_key = NULL and slice_value = NULL. Slices are defined by a single slice key.

Metrics are computed for all possible groups defined by the time windows and slice keys and values. In addition, for InferenceLog analysis, metrics are computed for each model id. For details, see Column schemas for generated tables.

Additional statistics for model accuracy monitoring (InferenceLog analysis only)

Additional statistics are calculated for InferenceLog analysis.

  • Model quality is calculated if both label_col and prediction_col are provided.

  • Slices are automatically created based on the distinct values of model_id_col.

  • For classification models, fairness and bias statistics are calculated for slices that have a Boolean value.

Query analysis and drift metrics tables

You can query the metrics tables directly. The following example is based on InferenceLog analysis:

SELECT
  window.start, column_name, count, num_nulls, distinct_count, frequent_items
FROM census_monitor_db.adult_census_profile_metrics
WHERE model_id = 1     Constrain to version 1
  AND slice_key IS NULL     look at aggregate metrics over the whole data
  AND column_name = "income_predicted"
ORDER BY window.start

Column schemas for generated tables

For each column in the primary table, the metrics tables contain one row for each combination of grouping columns. The column associated with each row is shown in the column column_name.

For metrics based on more than one column such as model accuracy metrics, column_name is set to :table.

For profile metrics, the following grouping columns are used:

  • time window

  • granularity (TimeSeries and InferenceLog analysis only)

  • log type - input table or baseline table

  • slice key and value

  • model id (InferenceLog analysis only)

For drift metrics, the following additional grouping columns are used:

  • comparison time window

  • drift type (comparison to previous window or comparison to baseline table)

The schemas of the metric tables are shown below, and are also shown in the Databricks Lakehouse Monitoring API reference documentation.

Profile metrics table schema

The following table shows the schema of the profile metrics table. Where a metric is not applicable to a row, the corresponding cell is null.

Column name

Type

Description

Grouping columns

window

Struct. See [1] below.

Time window.

granularity

string

Window duration, set by granularities parameter. [2]

model_id_col

string

Optional. Only used for InferenceLog analysis type.

log_type

string

Table used to calculate metrics. BASELINE or INPUT.

slice_key

string

Slice expression. NULL for default, which is all data.

slice_value

string

Value of the slicing expression.

column_name

string

Name of column in primary table. :table is a special name for metrics that apply to the whole table, such as model accuracy.

data_type

string

Spark data type of column_name.

logging_table_commit_version

int

Ignore.

monitor_version

bigint

Version of the monitor configuration used to calculate the metrics in the row. See [3] below for details.

Metrics columns - summary statistics

count

bigint

Number of non-null values.

num_nulls

bigint

Number of null values in column_name.

avg

double

Arithmetic mean of the column, ingoring nulls.

quantiles

array<double>

Array of 1000 quantiles. See [4] below.

distinct_count

bigint

Number of distinct values in column_name.

min

double

Minimum value in column_name.

max

double

Maximum value in column_name.

stddev

double

Standard deviation of column_name.

num_zeros

bigint

Number of zeros in column_name.

num_nan

bigint

Number of NaN values in column_name.

min_size

double

Minimum size of arrays or structures in column_name.

max_size

double

Maximum size of arrays or structures in column_name.

avg_size

double

Average size of arrays or structures in column_name.

min_len

double

Minimum length of string and binary values in column_name.

max_len

double

Maximum length of string and binary values in column_name.

avg_len

double

Average length of string and binary values in column_name.

frequent_items

Struct. See [1] below.

Top 100 most frequently occurring items.

non_null_columns

array<string>

List of columns with at least one non-null value.

median

double

Median value of column_name.

percent_null

double

Percent of null values in column_name.

percent_zeros

double

Percent of values that are zero in column_name.

percent_distinct

double

Percent of values that are distinct in column_name.

Metrics columns - classification model accuracy [5]

accuracy_score

double

Accuracy of model, calculated as (number of correct predictions / total number of predictions), ignoring null values.

confusion_matrix

Struct. See [1] below.

precision

Struct. See [1] below.

recall

Struct. See [1] below.

f1_score

Struct. See [1] below.

Metrics columns - regression model accuracy [5]

mean_squared_error

double

Mean squared error between prediction_col and label_col.

root_mean_squared_error

double

Root mean squared error between prediction_col and label_col.

mean_average_error

double

Mean average error between prediction_col and label_col.

mean_absolute_percentage_error

double

Mean absolute percentage error between prediction_col and label_col.

r2_score

double

R-squared score between prediction_col and label_col.

Metrics columns - fairness and bias [6]

predictive_parity

double

Measures whether the two groups have equal precision across all predicted classes. label_col is required.

predictive_equality

double

Measures whether the two groups have equal false positive rate across all predicted classes. label_col is required.

equal_opportunity

double

Measures whether the two groups have equal recall across all predicted classes. label_col is required.

statistical_parity

double

Measures whether the two groups have equal acceptance rate. Acceptance rate here is defined as the empirical probability to be predicted as a certain class, across all predicted classes.

[1] Format of struct for confusion_matrix, precision, recall, and f1_score:

Column name

Type

window

struct<start: timestamp, end: timestamp>

frequent_items

array<struct<item: string, count: bigint>>

confusion_matrix

struct<prediction: string, label: string, count: bigint>

precision

struct<one_vs_all: map<string,double>, macro: double, weighted: double>

recall

struct<one_vs_all: map<string,double>, macro: double, weighted: double>

f1_score

struct<one_vs_all: map<string,double>, macro: double, weighted: double>

[2] Depending on the value selected for granularity, the first analysis might not cover a full window. For example, if you create a monitor on December 9th and specify a granularity of “1 month”, only data from the period November 9 - 30 is included in the calculation for November. All windows after the first analysis include data from the full granularity period.

[3] The version shown in this column is the version that was used to calculate the statistics in the row and might not be the current version of the monitor. Each time you refresh the metrics, the monitor attempts to recompute previously calculated metrics using the current monitor configuration. The current monitor version appears in the monitor information returned by the API and Python Client.

[4] Sample code to retrieve the 50th percentile: SELECT element_at(quantiles, int((size(quantiles)+1)/2)) AS p50 ... or SELECT quantiles[500] ... .

[5] Only shown if the monitor has InferenceLog analysis type and both label_col and prediction_col are provided.

[6] Only shown if the monitor has InferenceLog analysis type and problem_type is classification.

Drift metrics table schema

The following table shows the schema of the drift metrics table. The drift table is only generated if a baseline table is provided, or if a consecutive time window exists after aggregation according to the specified granularities.

Column name

Type

Description

Grouping columns

window

struct<start: timestamp, end: timestamp>

Time window.

window_cmp

struct<start: timestamp, end: timestamp>

Comparison window for drift_type CONSECUTIVE.

drift_type

string

BASELINE or CONSECUTIVE. Whether the drift metrics compare to the previous time window or to the baseline table.

granularity

string

Window duration, set by granularities parameter. [7]

model_id_col

string

Optional. Only used for InferenceLog analysis type.

slice_key

string

Slice expression. NULL for default, which is all data.

slice_value

string

Value of the slicing expression.

column_name

string

Name of column in primary table. :table is a special name for metrics that apply to the whole table, such as model accuracy.

data_type

string

Spark data type of column_name.

monitor_version

bigint

Version of the monitor configuration used to calculate the metrics in the row. See [8] below for details.

Metrics columns - drift

Differences are calculated as current window - comparison window.

count_delta

double

Difference in count.

avg_delta

double

Difference in avg.

percent_null_delta

double

Difference in percent_null.

percent_zeros_delta

double

Difference in percent_zeros.

percent_distinct_delta

double

Difference in percent_distinct.

non_null_columns_delta

struct<added: int, missing: int>

Number of columns with any increase or decrease in non-null values.

chi_squared_test

struct<statistic: double, pvalue: double>

Chi-square test for drift in distribution.

ks_test

struct<statistic: double, pvalue: double>

KS test for drift in distribution. Calculated for numeric columns only.

tv_distance

double

Total variation distance for drift in distribution.

l_infinity_distance

double

L-infinity distance for drift in distribution.

js_distance

double

Jensen–Shannon distance for drift in distribution. Calculated for categorical columns only.

wasserstein_distance

double

Drift between two numeric distributions using the Wasserstein distance metric.

population_stability_index

double

Metric for comparing the drift between two numeric distributions using the population stability index metric. See [9] below for details.

[7] Depending on the value selected for granularity, the first analysis might not cover a full window. For example, if you create a monitor on December 9th and specify a granularity of “1 month”, only data from the period November 9 - 30 is included in the calculation for November. All windows after the first analysis include data from the full granularity period.

[8] The version shown in this column is the version that was used to calculate the statistics in the row and might not be the current version of the monitor. Each time you refresh the metrics, the monitor attempts to recompute previously calculated metrics using the current monitor configuration. The current monitor version appears in the monitor information returned by the API and Python Client.

[9] The output of the population stability index is a numeric value that represents how different two distributions are. The range is [0, inf). PSI < 0.1 means no significant population change. PSI < 0.2 indicates moderate population change. PSI >= 0.2 indicates significant population change.