UDFs for ABAC policies best practices

Preview

This page shows how to write high-performance user-defined functions (UDFs) for use in ABAC row filter and column mask policies in Unity Catalog.

Why UDF performance matters

ABAC policies run on every query execution for every applicable row or column. Inefficient UDFs can:

Block parallel execution and predicate pushdown
Force expensive joins or lookups per row
Increase query time from milliseconds to seconds (or more)
Break caching and vectorization
Make analytics and reporting unreliable

In production environments, this can make data governance a bottleneck instead of an enabler.

Golden rules for ABAC UDFs

Keep it simple: favor basic CASE statements and clear boolean expressions.
Stay deterministic: the same input should always produce the same output for caching and consistent joins.
Avoid external calls: no API calls or lookups to other databases.
Use only built-in functions inside your UDF: don’t call other UDFs from within a UDF.
Test at scale: validate performance on at least 1 million rows.
Minimize complexity: avoid multi-level nesting and unnecessary function calls.
Column-only references: reference only the target table's columns to enable pushdown.
Prefer SQL to Python UDFs: for better performance. If you must use Python, explicitly mark UDFs as DETERMINISTIC when they don't involve non-deterministic logic and dependencies.

Common anti-patterns to avoid

External API calls inside UDFs: causes network latency, timeouts, and single points of failure.
Complex subqueries or joins: forces nested loop joins and prevents optimization.
Heavy regex on large text: CPU- and memory-intensive per row.
Metadata lookups: avoid per-row checks that hit metadata or identity sources, such as information_schema queries, user profile lookups, is_member(), or is_account_group_member(). Adds extra scans for each record.
Dynamic SQL generation: no query optimization and prevents caching.
Non-deterministic logic: prevents caching and consistent joins, see Impact of non-deterministic logic.

Keep access checks in policies, not UDFs

A common mistake is calling is_account_group_member() or is_member() directly inside a UDF. This makes the function slower and makes the UDF more difficult to reuse.

Instead, follow this pattern:

UDF role: Focus only on how to transform, mask, or filter the data. Use only the columns and parameters passed into it.
Policy role: Define who (principals, groups) and when (tags) the UDF should apply by referencing principals in the ABAC policy.

Impact of non-deterministic logic

Some scenarios (such as randomized masking for research) require a different output each time.
If you must use non-deterministic functions:

Expect slower performance due to no caching.
JOINs might fail or return inconsistent results.
Reports might show different data between runs.
Troubleshooting and validation might be harder.

UDF examples

Below are production-friendly patterns for column masking and row filtering. All examples follow the ABAC performance best practices: simple logic, deterministic behavior, no external calls, and use of only built-in functions.

Column mask: deterministic pseudonymization with versioning

SQL
-- Create a consistent pseudonymized reference ID.
CREATE OR REPLACE FUNCTION mask_id_deterministic(id STRING)
RETURNS STRING
DETERMINISTIC
RETURN CONCAT('REF_', SHA2(CONCAT(id, ':v1'), 256));

Why this works:

Simple CASE statement
No external dependencies
Deterministic results for consistent joins
Keeps principal logic out of the UDF
Preserves joins across masked datasets
Includes a version tag (:v1) to support key rotation without breaking historical data intentionally

Column mask: partial reveal without regex hotspots

SQL
-- Reveal only the last 4 digits of an SSN, masking the rest.
CREATE OR REPLACE FUNCTION mask_ssn_last4(ssn STRING)
RETURNS STRING
DETERMINISTIC
RETURN CASE
  WHEN ssn IS NULL THEN NULL
  WHEN LENGTH(ssn) >= 4 THEN CONCAT('XXX-XX-', RIGHT(REGEXP_REPLACE(ssn, '[^0-9]', ''), 4))
  ELSE 'MASKED'
END;

Why this works:

Uses a single lightweight regex to strip non-digits
Avoids multiple regex passes on large text fields

Row filter: pushdown-friendly by region

SQL
-- Returns TRUE if the row's state is in the allowed set.
CREATE OR REPLACE FUNCTION filter_by_region(state STRING, allowed ARRAY<STRING>)
RETURNS BOOLEAN
DETERMINISTIC
RETURN array_contains(TRANSFORM(allowed, x -> lower(x)), lower(state));

Why this works:

Simple boolean logic
References only table columns
Enables predicate pushdown and vectorization

Row filter: deterministic multi-condition

SQL
-- Returns TRUE if the row's region is in the allowed set.
CREATE OR REPLACE FUNCTION filter_region_in(region STRING, allowed_regions ARRAY<STRING>)
RETURNS BOOLEAN
DETERMINISTIC
RETURN array_contains(TRANSFORM(allowed_regions, x -> lower(x)), lower(region));

Why this works:

Supports multiple roles and geographies in one function
Keeps logic flat for better optimization

Testing UDF performance

Use synthetic scale tests to validate behavior and performance before production. For example:

SQL
WITH test_data AS (
  SELECT
    patient_id,
    your_mask_function(patient_id) as masked_id,
    current_timestamp() as start_time
  FROM (
    SELECT CONCAT('PAT', LPAD(seq, 6, '0')) as patient_id
    FROM range(1000000)  -- 1 million test rows
  )
)
SELECT
  COUNT(*) as rows_processed,
  MAX(start_time) - MIN(start_time) as total_duration,
  COUNT(*) / EXTRACT(EPOCH FROM (MAX(start_time) - MIN(start_time))) as rows_per_second
FROM test_data;

ABAC column mask type casting

Databricks automatically casts the output of column mask functions resolved from ABAC policies to match the target column's data type. This ensures type consistency and reliable query behavior when masking columns.

How automatic casting works

Policy evaluation: The ABAC policy determines if masking should be applied.
Mask function execution: If masking is required, the mask function executes.
Automatic casting: The mask function result is automatically cast to the target column's data type.
Result return: The properly typed result is returned to the query.

All casting behavior follows ANSI SQL standards for CAST operations. For full compatibility details, see Returns.

Best practices for type-safe masking

Follow these patterns to ensure your mask functions work reliably with automatic type casting and avoid runtime failures.

Design type-consistent mask functions

Ensure mask functions return types compatible with target columns:

SQL
-- Successful: Returns same type as target column
CREATE FUNCTION safe_salary_mask(salary DOUBLE, user_role STRING)
RETURNS DOUBLE
RETURN CASE
  WHEN user_role IN ('admin', 'hr') THEN salary
  WHEN user_role = 'manager' THEN ROUND(salary / 1000) * 1000
  ELSE 0.0
END;

-- Unsuccessful: Returns different type that might not cast
CREATE FUNCTION risky_salary_mask(salary DOUBLE, user_role STRING)
RETURNS STRING
RETURN CASE
  WHEN user_role IN ('admin', 'hr') THEN CAST(salary AS STRING)
  ELSE 'CONFIDENTIAL'  -- This will fail casting to DOUBLE
END;

Use VARIANT for flexible masking

For complex masking scenarios with varying output types, use VARIANT to accommodate different data types:

SQL
CREATE FUNCTION flexible_mask(data VARIANT)
RETURNS VARIANT
RETURN CASE
  WHEN schema_of_variant(data) = 'INT' THEN 0::VARIANT
  WHEN schema_of_variant(data) = 'DATE' THEN DATE'1970-01-01'::VARIANT
  WHEN schema_of_variant(data) = 'DOUBLE' THEN 0.00::VARIANT
  ELSE NULL::VARIANT
END;

Test with diverse data scenarios

Test mask functions with different data patterns and user contexts before production deployment:

SQL
-- Test different access levels using different users/groups
SELECT * FROM sensitive_table;

-- Test casting behavior for different regional scenarios
SELECT CAST(mask_transaction(transaction_info, 'US') AS STRUCT<id INT, amount DOUBLE>);
SELECT CAST(mask_transaction(transaction_info, 'JP') AS STRUCT<id INT, amount DOUBLE>);
SELECT CAST(mask_transaction(transaction_info, 'CN') AS STRUCT<id INT, amount DOUBLE>);

Limitations

For limitations on row filers and column masks, see Limitations.

Why UDF performance matters​

Golden rules for ABAC UDFs​

Common anti-patterns to avoid​

Keep access checks in policies, not UDFs​

Impact of non-deterministic logic​

UDF examples​

Column mask: deterministic pseudonymization with versioning​

Column mask: partial reveal without regex hotspots​

Row filter: pushdown-friendly by region​

Row filter: deterministic multi-condition​

Testing UDF performance​

ABAC column mask type casting​

How automatic casting works​

Best practices for type-safe masking​

Design type-consistent mask functions​

Use VARIANT for flexible masking​

Test with diverse data scenarios​

Limitations​

Why UDF performance matters

Golden rules for ABAC UDFs

Common anti-patterns to avoid

Keep access checks in policies, not UDFs

Impact of non-deterministic logic

UDF examples

Column mask: deterministic pseudonymization with versioning

Column mask: partial reveal without regex hotspots

Row filter: pushdown-friendly by region

Row filter: deterministic multi-condition

Testing UDF performance

ABAC column mask type casting

How automatic casting works

Best practices for type-safe masking

Design type-consistent mask functions

Use VARIANT for flexible masking

Test with diverse data scenarios

Limitations