Skip to main content

Anomaly detection

Beta

This feature is in Beta.

This page describes what anomaly detection is, what it monitors, and how to use it.

What is anomaly detection?

With Lakehouse Monitoring anomaly detection, you can easily monitor the data quality of all of the tables in a schema. Databricks leverages data intelligence to automatically assess data quality, specifically evaluating the freshness and completeness of each table. Quality insights are populated in health indicators so consumers can understand health at a glance. Data owners have access to logging tables and dashboards so they can quickly identify, set alerts, and resolve anomalies across an entire schema.

Requirements

  • Unity Catalog enabled workspace.
  • Serverless compute enabled. For instructions, see Connect to serverless compute.
  • To enable anomaly detection on a schema, you must have MANAGE SCHEMA or MANAGE CATALOG privileges on the catalog schema.

How does anomaly detection work?

Databricks monitors enabled tables for freshness and completeness.

Freshness refers to how recently a table has been updated. Anomaly detection analyzes the history of commits to a table and builds a per-table model to predict the time of the next commit. If a commit is unusually late, the table is marked as stale. For time series tables, you can specify event time columns. Anomaly detection then detects if the data's ingestion latency, defined as the difference between the commit time and the event time, is unusually high.

Completeness refers to the number of rows expected to be written to the table in the last 24 hours. Anomaly detection analyzes the historical row count, and based on this data, predicts a range of expected number of rows. If the number of rows committed over the last 24 hours is less than the lower bound of this range, a table is marked as incomplete.

Enable anomaly detection on a schema

To enable anomaly detection on a schema, navigate to the schema in Unity Catalog.

  1. On the schema page, click the Details tab.

    Details tab for the schema page in Catalog Explorer.

  2. Click the Anomaly Detection toggle to enable it.

    Anomaly detection selector enabled.

  3. A Databricks scheduled job is initiated to scan the schema and a dialog appears. To view the progress of the job, click View results in the dialog. After the job completes, you will see the detected anomalies logged in the output logging table. You can also access the dialog at any time by clicking Settings next to the anomaly detection toggle.

    Anomaly detection scan dialog.

  4. By default, the job runs every 6 hours. To change this setting, see Set parameters for freshness and completeness evaluation.

Review logged results

By default, the results of an anomaly detection scan are saved in the schema in a table named _quality_monitoring_summary to which only the user who enabled anomaly detection has access. To configure the name or location of the logging table, see Set parameters for freshness and completeness evaluation.

The table has the following information:

Column name

Type

Description

evaluated_at

timestamp

Start time of anomaly scan run.

catalog

string

Catalog containing the schema on which anomaly scan was run.

schema

string

Schema on which anomaly scan was run.

table_name

string

Name of the scanned table.

quality_check_type

string

Freshness or Completeness

status

string

Result of the quality check. One of Healthy, Unhealthy, or Unknown. If the result is Unknown, see error_message for more details.

additional_debug_info

map

This field provides the values that were used to determine the table's status. For details, see Debugging information.

error_message

string

If status is Unknown, additional information appears here to help with debugging.

table_lineage_link

string

Link to table lineage tab in Catalog Explorer, to help with investigating the root cause of an Unhealthy table.

downstream_impact

struct

Impact of an anomaly on downstream assets. For details, see Downstream impact information.

Debugging information

In the logged results table, the column additional_debug_info provides information in the following format:

Bash
[
<metric_name>:
actual_value: <value> ,
expectation: “actual value < [predicted_value]
is_violated: true/false,
error_code = <error_code>
...
]

For example:

JSON
{
commit_staleness:
actual_value: "31 min"
expectation: "actual_value < 1 day"
is_violated: "false"
error_code: "None"
}

Downstream impact information

In the logged results table, the column downstream_impact is a struct with the following fields:

Field

Type

Description

impact_level

int

Integer value between 1 and 4 indicating the severity of the anomaly. Higher values indicate greater disruption.

num_downstream_tables

int

Number of downstream tables that might be affected by the anomaly.

num_queries_on_affected_tables

int

Total number of queries that have referenced the affected and downstream tables in the past 30 days.

View health indicators

Anomaly detection provides data consumers with a quick visual confirmation of the data freshness of the tables they use.

On the schema page, in the Overview tab, tables that passed the most recent freshness scan are marked with a green dot. Tables that failed the scan are shown with an orange dot.

Schema overview page in Catalog Explorer showing tables with quality passed mark.

Click on the dot to see the time and status of the most recent scan.

Popup showing details about health status.

As a data owner, you can easily assess the overall health of your schema by sorting tables based on quality. Use the Sort menu at the upper-right of the table list to sort tables by quality.

On the table page, in the Overview tab, a Quality indicator shows the status of the table and lists any anomalies that were identified in the most recent scan.

Healthy quality indicator on table page in Catalog Explorer.

Set parameters for freshness and completeness evaluation

To edit the parameters that control the anomaly detection job, such as how often the job runs or the name of the logged results table, you must edit the job parameters on the Tasks tab of the job page.

Jobs page showing anomaly detection job.

The following sections describe specific settings. For information about how to set task parameters, see Configure task parameters.

Schedule and notifications

To customize the schedule for the anomaly detection job, or to set up notifications, use the Schedules & Triggers settings on the jobs page. See Automating jobs with schedules and triggers.

Name of logging table

To change the name of the logging table, or save the table in a different schema, edit the job task parameter logging_table_name and specify the desired name. To save the logging table in a different schema, specify the full 3-level name.

Customize freshness and completeness evaluations

All of the parameters in this section are optional. By default, anomaly detection determines thresholds based on an analysis of the table's history.

These parameters are fields inside the task parameter metric_configs. The format of metric_configs is a JSON string with the following default values:

JSON
[
{
"disable_check": false,
"tables_to_skip": null,
"tables_to_scan": null,
"table_threshold_overrides": null,
"table_latency_threshold_overrides": null,
"static_table_threshold_override": null,
"event_timestamp_col_names": null,
"metric_type": "FreshnessConfig"
},
{
"disable_check": true,
"tables_to_skip": null,
"tables_to_scan": null,
"table_threshold_overrides": null,
"metric_type": "CompletenessConfig"
}
]

The following parameters can be used for both freshness and completeness evaluations.

Field name

Description

Example

tables_to_scan

Only the specified tables are scanned for anomaly detection.

["table_to_scan", "another_table_to_scan"]

tables_to_skip

The specified tables are skipped during the anomaly detection scan.

["table_to_skip"]

disable_logging

If set to true, results from the job run are not saved to the logging table.

true, false

disable_check

Anomaly scan is not run. Use this parameter if you want to disable only the freshness scan or only the completeness scan.

true, false

The following parameters apply only to the freshness evaluation:

Field name

Description

Example

event_timestamp_col_names

List of timestamp columns tables in your schema might have. If a table has one of these columns, it is marked Unhealthy if the maximum value of this column is exceeded.

["timestamp", "date"]

table_threshold_overrides

A dictionary consisting of table names and thresholds (in seconds) that specify the maximum interval since the last table update before marking a table as Unhealthy.

{"table_0": 86400}

table_latency_threshold_overrides

A dictionary consisting of table names and latency thresholds (in seconds) that specify the maximum interval since the last timestamp in the table before marking a table as Unhealthy.

{"table_1": 3600}

static_table_threshold_override

Amount of time (in seconds) before a table is considered as a static table (that is, one that is no longer updated).

2592000

The following parameter applies only to the completeness evaluation:

Field name

Description

Example

table_threshold_overrides

A dictionary consisting of table names and row volume thresholds (specified as integers). If the number of rows added to a table over the previous 24 hours is less than the specified threshold, the table is marked Unhealthy.

{"table_0": 1000}

Disable anomaly detection

To disable anomaly detection, click the Anomaly Detection toggle to disable it. The anomaly detection job will be deleted, all anomaly detection tables and information are deleted.

Anomaly detection selector disabled.

To provide feedback on anomaly detection, email lakehouse-monitoring-feedback@databricks.com.

Limitation

Anomaly detection does not support the following:

  • Views, materialized views, or streaming tables.
  • External tables or foreign tables.
  • Tables with fewer than 2 commits in the last 30 days.
  • The determination of completeness does not take into account metrics such as the fraction of nulls, zero values, or NaN.