Monitor fairness and bias for classification models

With Databricks Lakehouse Monitoring, you can monitor the predictions of a classification model to see if the model performs similarly on data associated with different groups. For example, you can investigate whether a loan-default classifier generates the same false-positive rate for applicants from different demographics.

Work with fairness and bias metrics

To monitor for fairness and bias, you create a Boolean slice expression. The group defined by the slice expression evaluating to True is considered the protected group (that is, the group you are checking for bias against). For example, if you create slicing_exprs=["age < 25"], the slice identified by slice_key = “age < 25” and slice_value = True is considered the protected group, and the slice identified by slice_key = “age < 25” and slice_value = False is considered the unprotected group.

The monitor automatically computes metrics that compare the performance of the classification model between groups. The following metrics are reported in the profile metrics table:

predictive_parity, which compares the model’s precision between groups.
predictive_equality, which compares false positive rates between groups.
equal_opportunity, which measures whether a label is predicted equally well for both groups.
statistical_parity, which measures the difference in predicted outcomes between groups.

These metrics are calculated only if the analysis type is InferenceLog and problem_type is classification.

For definitions of these metrics, see the following references:

Wikipedia article on fairness in machine learning: https://en.wikipedia.org/wiki/Fairness_(machine_learning)
Fairness Definitions Explained, Verma and Rubin, 2018

Fairness and bias metrics outputs

See the API reference for details about these metrics and how to view them in the metric tables. All fairness and bias metrics share the same data type as shown below, showing fairness scores computed across all predicted classes in an “one-vs-all” manner as key-value pairs.

You can create an alert on these metrics. For instance, the owner of the model can set up an alert when the fairness metric exceeds some threshold and then route that alert to an on-call person or team for investigation.