Data quality monitoring results system table reference
This feature is in Beta.
This page outlines the data quality monitoring results system table schema and includes sample queries. The table stores results of freshness and completeness checks, as well as downstream impact and root cause analysis, across all tables enabled for data quality monitoring in your metastore.
Table path: system.data_quality_monitoring.table_results
Only account admins can access this table, and they must grant access to others as needed. The system table uses default storage. Because it contains sample values and downstream usage data, use caution when granting access to others.
Data quality monitoring results table schema
The system.data_quality_monitoring.table_results
table uses the following schema:
Column name | Contents (for | Data type | Description | Example data |
---|---|---|---|---|
| timestamp | Time when the row was generated. |
| |
| string | Name of the catalog. Used to identify the table. |
| |
| string | Name of the schema. Used to identify the table. |
| |
| string | Name of the table. Used to identify the table. |
| |
| string | Stable ID for the catalog. |
| |
| string | Stable ID for the schema. |
| |
| string | Stable ID for the table. |
| |
| string | Consolidated health status at the table level. "Unhealthy" if any check or group is unhealthy. |
| |
| struct | Freshness checks. | ||
| string | Overall freshness status. |
| |
| Commit freshness check results. | |||
| struct | Completeness check results. | ||
| string | Status of completeness check. |
| |
| Total number of rows in the table over time. | |||
| Number of rows added each day. | |||
| struct | Summary of downstream impact based on dependency graph. | ||
| int | Severity indicator (0 = none, 1 = low, 2 = medium, 3 = high, 4 = very high). | 2 | |
| int | Number of downstream tables affected. | 5 | |
| int | Number of queries run on affected downstream tables over the last 30 days. | 120 | |
| struct | Information about upstream jobs contributing to the issue. | ||
| Metadata for each upstream job. |
commit_freshness
array structure
The commit_freshness
struct contains the following:
Item name | Data type | Description | Example data |
---|---|---|---|
| string | Status of commit freshness check. |
|
| string | Error message encountered during check. |
|
| timestamp | Last commit timestamp. |
|
| timestamp | Predicted time by which the table should have been updated. |
|
total_row_count
and daily_row_count
array structure
The total_row_count
and daily_row_count
structs contain the following:
Item name | Data type | Description | Example data |
---|---|---|---|
| string | Status of the check. |
|
| string | Error message encountered during check. |
|
| int | Number of rows observed in the last 24 hours. |
|
| int | Minimum expected number of rows in the last 24 hours. |
|
| int | Maximum expected number of rows in the last 24 hours. |
|
upstream_jobs
array structure
The structure of the array shown in the upstream_jobs
column is shown in the following table:
Item name | Data type | Description | Example data |
---|---|---|---|
| string | Job ID. |
|
| string | Workspace ID. |
|
| string | Job display name. |
|
| string | Status of the most recent run. |
|
| string | URL of Databricks job run page. |
|
Downstream impact information
In the logged results table, the column downstream_impact
is a struct
with the following fields:
Field | Type | Description |
---|---|---|
| int | Integer value between 1 and 4 indicating the severity of the data quality issue. Higher values indicate greater disruption. |
| int | Number of downstream tables that might be affected by the identified issue. |
| int | Total number of queries that have referenced the affected and downstream tables in the past 30 days. |
Example queries
Replace parameter values with your own before running.
Get all current incidents in a schema
WITH latest_rows AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY table_id
ORDER BY event_time DESC
) AS rn
FROM
system.data_quality_monitoring.table_results
WHERE
catalog_name = "c"
AND schema_name = "s"
)
SELECT *
FROM latest_rows
WHERE
rn = 1
AND status = "Unhealthy"
Get all incident tables in a schema that have a high downstream impact
WITH latest_rows AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY table_id
ORDER BY event_time DESC
) AS rn
FROM
system.data_quality_monitoring.table_results
WHERE
catalog_name = "c"
AND schema_name = "s"
)
SELECT *
FROM latest_rows
WHERE rn = 1
AND downstream_impact.impact_level >= 3
Get all tables in a schema that are currently impacted by a freshness issue
WITH latest_rows AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY table_id
ORDER BY event_time DESC
) AS rn
FROM
system.data_quality_monitoring.table_results
WHERE
catalog_name = "c"
AND schema_name = "s"
)
SELECT *
FROM latest_rows
WHERE rn = 1
AND freshness.status = "Unhealthy"
Get all historical records for a table
SELECT *
FROM system.data_quality_monitoring.table_results
WHERE
catalog_name = "c"
AND schema_name = "s"
AND table_name = "t"