classification-monitor(Python)

Loading...

Lakehouse monitoring example notebook: InferenceLog classification analysis

User requirements

  • You must have access to run commands on a cluster with access to Unity Catalog.
  • You must have USE CATALOG privilege on at least one catalog, and you must have USE SCHEMA privileges on at least one schema. This notebook creates tables in the main.default schema. If you do not have the required privileges on the main.default schema, you must edit the notebook to change the default catalog and schema to ones that you do have privileges on.

System requirements:

  • Your workspace must be enabled for Unity Catalog.
  • Databricks Runtime 12.2 LTS ML or above.
  • A Single user or Assigned cluster.

This notebook illustrates how to train and deploy a classification model and monitor its corresponding batch inference table.

For more information about Lakehouse monitoring, see the documentation (AWS | Azure).

Setup

  • Verify cluster configuration
  • Install Python SDK
  • Define catalog, schema, model and table names
3

Install Lakehouse Monitoring client wheel

    5

    Specify catalog and schema to use

    7

    8

    9

    Helper methods

    The function(s) are for cleanup, if the notebook has been run multiple times. You would not typically use these functions in a normal setup.

    11

    Background

    The following are required to create an inference log monitor:

    • A Delta table in Unity Catalog that you own.

    • The data can be batch scored data or inference logs. The following columns are required:

      • timestamp (TimeStamp): Used for windowing and aggregation when calculating metrics
      • model_id (String): Model version/id used for each prediction.
      • prediction (String): Value predicted by the model.
    • The following column is optional:

      • label (String): Ground truth label.

    You can also provide an optional baseline table to track performance changes in the model and drifts in the statistical characteristics of features.

    • To track performance changes in the model, consider using the test or validation set.
    • To track drifts in feature distributions, consider using the training set or the associated feature tables.
    • The baseline table must use the same column names as the monitored table, and must also have a model_version column.

    Databricks recommends enabling Delta's Change-Data-Feed (AWS|Azure) table property for better metric computation performance for all monitored tables, including the baseline table. This notebook shows how to enable Change Data Feed when you create the Delta table.

    User Journey

    1. Create Delta table: Read raw input and features data and create training and inference sets.
    2. Train a model, register the model the MLflow Model Registry.
    3. Generate predictions on test set and create the baseline table.
    4. Generate predictions on scoring_df1. This is the inference table.
    5. Create the monitor on the inference table and analyse profile/drift metrics and fairness and bias metrics.
    6. Simulate drifts in 3 relevant features, scoring_df2 and generate/materialize predictions.
    7. Add/Join ground-truth labels to monitoring table and refresh monitor.
    8. [Optional] Calculate custom metrics.
    9. [Optional] Delete the monitor.

    1. Read dataset and prepare data

    Dataset used for this example: UCI's Adult Census

    • Add a dummy identifer
    • Clean and standardize missing values
    15

    16

    1.1 Split data

    Split data into a training set, baseline test table, and inference table.

    • The baseline test data will serve as the table with reference feature distributions.
    • The inference table will then be split into two dataframes, scoring_df1 and scoring_df2: they will function as new incoming batches for scoring. We will further simulate drifts on the scoring_df(s).
    18

      2. Train a random forest model

      20

      21

      22

      23

      3. Create baseline table

      For information about how to select a baseline table, see the Lakehouse Monitoring documentation (AWS|Azure).

      25

      26

      Write table with CDF enabled

      4. Generate predictions on incoming scoring data

      Example pre-processing step

      • Extract ground-truth labels (in practice, labels might arrive later)
      • Split into two batches
      29

      30

      4.1 Write scoring data with predictions out

      • Add model_version column and write to the table that we will attach a monitor to
      • Add ground-truth label_col column with empty/NaN values

      Set mergeSchema to True to enable appending dataframes without label column available

      32

      5. Create the monitor

      Use InferenceLog type analysis.

      Make sure to drop any column that you don't want to track or which doesn't make sense from a business or use-case perspective, otherwise create a VIEW with only columns of interest and monitor it.

      34

      35

        36

        Create Monitor

        38

        39

        To view the dashboard, click Dashboards in the left nav bar.

        You can also navigate to the dashboard from the primary table in the Catalog Explorer UI. On the Quality tab, click the View dashboard button.

        For details, see the documentation (AWS | Azure).

        41

          5.1 Inspect the metrics tables

          By default, the metrics tables are saved in the default database.

          The create_monitor call created two new tables: the profile metrics table and the drift metrics table.

          These two tables record the outputs of analysis jobs. The tables use the same name as the primary table to be monitored, with the suffixes _profile_metrics and _drift_metrics.

          Orientation to the profile metrics table

          The profile metrics table has the suffix _profile_metrics. For a list of statistics that are shown in the table, see the documentation (AWS|Azure).

          • For every column in the primary table, the profile table shows summary statistics for the baseline table and for the primary table. The column log_type shows INPUT to indicate statistics for the primary table, and BASELINE to indicate statistics for the baseline table. The column from the primary table is identified in the column column_name.
          • For TimeSeries type analysis, the granularity column shows the granularity corresponding to the row. For baseline table statistics, the granularity column shows null.
          • The table shows statistics for each value of each slice key in each time window, and for the table as whole. Statistics for the table as a whole are indicated by slice_key = slice_value = null.
          • In the primary table, the window column shows the time window corresponding to that row. For baseline table statistics, the window column shows null.
          • Some statistics are calculated based on the table as a whole, not on a single column. In the column column_name, these statistics are identified by :table.
          44

          Orientation to the drift metrics table

          The drift metrics table has the suffix _drift_metrics. For a list of statistics that are shown in the table, see the documentation (AWS|Azure).

          • For every column in the primary table, the drift table shows a set of metrics that compare the current values in the table to the values at the time of the previous analysis run and to the baseline table. The column drift_type shows BASELINE to indicate drift relative to the baseline table, and CONSECUTIVE to indicate drift relative to a previous time window. As in the profile table, the column from the primary table is identified in the column column_name.
            • At this point, because this is the first run of this monitor, there is no previous window to compare to. So there are no rows where drift_type is CONSECUTIVE.
          • For TimeSeries type analysis, the granularity column shows the granularity corresponding to that row.
          • The table shows statistics for each value of each slice key in each time window, and for the table as whole. Statistics for the table as a whole are indicated by slice_key = slice_value = null.
          • The window column shows the the time window corresponding to that row. The window_cmp column shows the comparison window. If the comparison is to the baseline table, window_cmp is null.
          • Some statistics are calculated based on the table as a whole, not on a single column. In the column column_name, these statistics are identified by :table.
          46

          5.2 Look at fairness and bias metrics

          Fairness and bias metrics are calculated for boolean type slices that were defined. The group defined by slice_value=true is considered the protected group (AWS|Azure).

          48

          6. Create data drifts(s) in 3 features

          Simulate distribution changes for workclass, gender and hours_per_week

          50

            51

            6.1 Generate predictions on drifted observations and update inference tables

            • Add the column model_id
            53

            7. (Ad-hoc) Join/Update ground-truth labels to inference table

            Note: if ground-truth value can change for a given id through time, then consider also joining/merging on timestamp column

            Using MERGE INTO (Recommended)

            8. [Optional] Refresh metrics by also adding custom metrics

            See the documentation for more details about how to create custom metrics (AWS|Azure).

            57

            Update monitor

            Refresh metrics and inspect dashboard

            60

            9. [Optional] Delete the monitor

            Uncomment the following line of code to clean up the monitor (if you wish to run the quickstart on this table again).

            62