Data classification system table reference
This feature is in Beta.
This page outlines the data classification results table schema and includes sample queries. The table stores detections for sensitive data classes at the column level across enabled catalogs in your metastore.
Table path: system.data_classification.results
Data classification results table schema
The data classification results system table uses the following schema:
Column name | Data type | Description | Example |
---|---|---|---|
| timestamp | As-of time when the column was most recently scanned. |
|
| timestamp | Time when the column detection was first recorded. |
|
| string | ID of the catalog. |
|
| string | ID of the table. |
|
| string | Catalog name. |
|
| string | Schema name. |
|
| string | Table name. |
|
| string | Column name. |
|
| string | Data type of the column. Complex types include full struct definitions. |
|
| string | Tag for the detected entity or tag key and optional value. |
|
|
| Up to five sample values that matched the detection. |
|
| string | Confidence of detection. Either |
|
| float | Estimation of the proportion of matching rows in the sample. Between 0 and 1. |
|
Example queries
Replace parameter values with your own before running.
Get all detections for a table
SELECT *
FROM system.data_classification.results
WHERE
catalog_name = "c"
AND schema_name = "s"
AND table_name = "t";
Get all high confidence detections
SELECT *
FROM system.data_classification.results
WHERE
catalog_name = "c"
AND schema_name = "s"
AND table_name = "t"
AND confidence = "HIGH";
Get number of tables impacted by a specific classification
SELECT
class_tag,
COUNT(DISTINCT catalog_name, schema_name, table_name) AS num_tables
FROM
system.data_classification.results
WHERE
class_tag IS NOT NULL
GROUP BY class_tag;
Get number of users who queried tables with sensitive data in last 30 days
WITH table_accesses AS (
SELECT
IFNULL(
request_params.full_name_arg,
CONCAT(request_params.catalog_name, '.', request_params.schema_name, '.', request_params.name)
) AS table_full_name,
COUNT(DISTINCT user_identity.email) AS num_users
FROM
system.access.audit
WHERE
action_name IN ("createTable", "getTable", "updateTable", "deleteTable")
AND (
-- For performance, limit the blast radius of the audit log query to only the current catalog
request_params.catalog_name = :catalog_name OR
request_params.full_name_arg LIKE :catalog_name || '%'
)
AND event_time >= DATE_SUB(current_date(), 30)
GROUP BY table_full_name
),
sensitive_tables AS (
SELECT
DISTINCT CONCAT(catalog_name, '.', schema_name, '.', table_name) AS table_full_name
FROM
system.data_classification.results
WHERE class_tag IS NOT NULL
)
SELECT
st.table_full_name,
ta.num_users
FROM
sensitive_tables st
JOIN table_accesses ta
ON st.table_full_name = ta.table_full_name