Common patterns for row filtering and column masking
This page describes common patterns for implementing ABAC row filter and column mask policies. For overall concepts, see Core concepts for attribute-based access control (ABAC). For policy syntax, see Create and manage ABAC policies.
Cast-compatible masking functions
Databricks automatically casts the masking function output to match the target column's data type. See Automatic type casting for column masks.
The following patterns help you design cast-compatible masking functions.
Return a castable type
When masking a column, return the same data type or a type that is castable to it. Check the data types of the columns your policy targets and verify that every branch of the function returns a compatible value.
-- Succeeds: Masks a DOUBLE column, returns DOUBLE in every branch
CREATE FUNCTION mask_salary(salary DOUBLE, user_role STRING)
RETURNS DOUBLE
RETURN CASE
WHEN user_role IN ('admin', 'hr') THEN salary
WHEN user_role = 'manager' THEN ROUND(salary / 1000) * 1000
ELSE 0.0
END;
-- Fails: 'CONFIDENTIAL' cannot be cast to a DOUBLE column type
CREATE FUNCTION mask_salary_as_text(salary DOUBLE, user_role STRING)
RETURNS STRING
RETURN CASE
WHEN user_role IN ('admin', 'hr') THEN CAST(salary AS STRING)
ELSE 'CONFIDENTIAL'
END;
Avoid numeric overflow
When a mask function accepts and returns a wider numeric type than the target column, the result is automatically cast back to the column's type. If the returned value exceeds the range of the narrower type, the cast overflows and the query fails at runtime.
-- The target column is TINYINT (max 127). The input is upcast to BIGINT
-- for the function. Adding 1000 produces a BIGINT result that overflows
-- when cast back to TINYINT.
CREATE FUNCTION mask_score(score BIGINT)
RETURNS BIGINT
RETURN score + 1000;
Use VARIANT for multiple column types
See VARIANT-based masking functions for multiple column types.
Test cast compatibility
Test masking functions with different data patterns.
SELECT CAST(mask_salary(salary, 'admin') AS DOUBLE) FROM employees;
SELECT CAST(mask_salary(salary, 'manager') AS DOUBLE) FROM employees;
SELECT CAST(mask_salary(salary, 'viewer') AS DOUBLE) FROM employees;
VARIANT-based masking functions for multiple column types
When you need to mask columns of different data types (for example, INT, DOUBLE, DECIMAL(10,2), DECIMAL(15,5), and so on), you can write a single masking UDF that accepts and returns a VARIANT type. Databricks automatically casts the column mask function output to match the target column's data type following ANSI SQL standards.
This approach reduces the number of UDFs and policies needed. Instead of writing separate masking functions for each column type, one function handles all types.
Mask multiple numeric types with a single function
Rather than creating a separate mask function for each numeric precision, you can use VARIANT to handle them all with a single function:
CREATE FUNCTION mask_numeric(val VARIANT)
RETURNS VARIANT
DETERMINISTIC
RETURN 0::VARIANT;
This function returns 0 as a VARIANT, which Databricks automatically casts to the target column's type. A single ABAC policy using this function can mask INT, DOUBLE, and DECIMAL columns without requiring separate functions for each precision.
If you prefer to preserve the type explicitly within the function, you can branch on the type and return an appropriate masked value for each using schema_of_variant():
-- Use VARIANT to accommodate different data types
CREATE FUNCTION flexible_mask(data VARIANT)
RETURNS VARIANT
RETURN CASE
WHEN schema_of_variant(data) = 'INT' THEN 0::VARIANT
WHEN schema_of_variant(data) = 'DATE' THEN DATE'1970-01-01'::VARIANT
WHEN schema_of_variant(data) = 'DOUBLE' THEN 0.00::VARIANT
ELSE NULL::VARIANT
END;
Mask struct columns with VARIANT
For Databricks Runtime 18.1 and above, you can also mask struct columns by casting them to VARIANT within an ABAC policy. Branch on the struct's shape to selectively redact fields:
Casting structs to VARIANT for masking is supported only within ABAC column mask policies.
The following example uses schema_of_variant() to identify two different struct shapes and redact sensitive fields in each:
CREATE FUNCTION flexible_mask(data VARIANT)
RETURNS VARIANT
RETURN CASE
WHEN schema_of_variant(data) = 'OBJECT<age: BIGINT, email: STRING>' THEN
to_variant_object(named_struct('age', data:age, 'email', 'redacted'))
WHEN schema_of_variant(data) = 'OBJECT<id: BIGINT, ssn: STRING>' THEN
to_variant_object(named_struct('id', data:id, 'ssn', 'xxx-xx-xxxx'))
ELSE NULL::VARIANT
END;
Prevent access until sensitive columns are tagged
A common governance pattern is to control access based on whether data has been classified. You can implement this with a default restrictive tag and policies that enforce different levels of protection depending on the classification status.
- Apply a tag like
classification : unverifiedto all new objects by default, through automation or through tag inheritance by applying the tag at the catalog or schema level, so that any new tables added to the catalog or schema automatically inherit the tag. - Create a row filter policy that blocks access to tables tagged
classification : unverified. - Create a column mask policy that masks sensitive columns on tables where the
classification : unverifiedtag is no longer present. - When a data steward completes classification, they update the tag. The blocking policy no longer matches, and the masking policy takes effect.
-- Block access to unverified tables for all non-admin users
CREATE FUNCTION catalog.schema.block_all() RETURNS BOOLEAN
RETURN FALSE;
CREATE POLICY block_unverified
ON CATALOG my_catalog
ROW FILTER catalog.schema.block_all
TO `account users` EXCEPT `data_admins`
FOR TABLES
WHEN has_tag_value('classification', 'unverified');
To protect sensitive data after it has been classified, define a column mask policy that takes effect when the classification : unverified tag is no longer present:
CREATE FUNCTION catalog.schema.mask_pii(val STRING)
RETURNS STRING
RETURN '***';
CREATE POLICY mask_reviewed_pii
ON CATALOG my_catalog
COLUMN MASK catalog.schema.mask_pii
TO `account users`
EXCEPT `data_admins`
FOR TABLES
WHEN NOT has_tag_value('classification', 'unverified')
MATCH COLUMNS (has_tag_value('pii', 'name') OR has_tag_value('pii', 'address')) AS m
ON COLUMN m;
Partial reveal without regex
Reveal part of a sensitive value using string operations instead of regex. Regex-based masking scans the entire value for every row, which is expensive on large text fields (see Avoid regex masking on large text fields).
CREATE FUNCTION mask_ssn(ssn STRING, show_last INT) RETURNS STRING
DETERMINISTIC
RETURN CONCAT('***-**-', RIGHT(ssn, show_last));
Consistent hashing (deterministic pseudonymization)
Consistent hashing (also called deterministic pseudonymization) replaces sensitive data with a hashed value that is the same across multiple tables. Marking a function as DETERMINISTIC tells the engine that the function always returns the same result for the same input, which helps it optimize the query. See Use deterministic, error-safe expressions.
The following function consistently hashes a string value, and uses a version parameter to support key rotation. Increment the version number through the policy's USING COLUMNS clause to generate new hashes without breaking historical data that used the previous version. The function concatenates the original value with the version number before hashing, so the same input with the same version always produces the same hash.
CREATE FUNCTION pseudonymize(val STRING, version INT) RETURNS STRING
DETERMINISTIC
RETURN SHA2(CONCAT(val, CAST(version AS STRING)), 256);
Row filtering with column-only predicates
Filter rows using simple boolean logic that references only table columns. Column-only predicates enable predicate pushdown, which allows the engine to skip irrelevant data during scans (see Understand predicate pushdown on protected tables).
CREATE FUNCTION filter_by_region(region STRING, allowed STRING)
RETURNS BOOLEAN
DETERMINISTIC
RETURN array_contains(split(allowed, ','), lower(region));
Use with a policy that passes the allowed regions as a constant:
CREATE POLICY regional_access
ON CATALOG analytics
ROW FILTER filter_by_region
TO 'emea_team'
FOR TABLES
MATCH COLUMNS has_tag('region') AS rgn
USING COLUMNS (rgn, 'emea,apac');
Row filtering across multiple related columns
When a table has multiple columns representing related attributes (for example, ship_to_country and bill_to_country), you can match them with separate tag conditions and pass both to a single UDF. This avoids creating separate policies for each column. A policy can include up to three column expressions in the MATCH COLUMNS clause (see Policy quotas).
CREATE FUNCTION filter_by_countries(ship_country STRING, bill_country STRING, allowed STRING)
RETURNS BOOLEAN
DETERMINISTIC
RETURN array_contains(split(allowed, ','), lower(ship_country))
OR array_contains(split(allowed, ','), lower(bill_country));
CREATE POLICY regional_orders
ON SCHEMA prod.orders
ROW FILTER filter_by_countries
TO analysts
FOR TABLES
WHEN has_tag_value('sensitivity', 'high')
MATCH COLUMNS
has_tag('ship_country') AS ship,
has_tag('bill_country') AS bill
USING COLUMNS (ship, bill, 'us,ca,mx');
An analyst sees only orders where either the shipping or billing country is in their allowed list.
Lookup tables in ABAC policy UDFs
When access rules vary per user and cannot be expressed through the policy's TO/EXCEPT clauses alone, you can check access rights against a small lookup table. Use TO/EXCEPT when possible, as it is the preferred approach for targeting principals (see Approach for targeting principals). Keep the lookup table small so the optimizer converts the subquery into a broadcast hash join (see Keep lookup tables small).
CREATE TABLE access_rules (
principal VARCHAR(255),
priority VARCHAR(64)
);
INSERT INTO access_rules VALUES
('alice@company.com', '1-URGENT'),
('alice@company.com', '2-HIGH'),
('bob@company.com', '1-URGENT');
CREATE FUNCTION priority_allowed(o_priority STRING) RETURNS BOOLEAN
RETURN EXISTS (
SELECT 1 FROM access_rules
WHERE principal = session_user() AND priority = o_priority
);
CREATE POLICY priority_filter
ON CATALOG operations
ROW FILTER priority_allowed
TO `account users`
FOR TABLES
MATCH COLUMNS has_tag('priority') AS pri
USING COLUMNS (pri);