Filter sensitive table data with row filters and column masks
Preview
This feature is in Public Preview.
This article provides guidance and examples for using row filters, column masks, and mapping tables to filter sensitive data in your tables.
What are row filters?
Row filters allow you to apply a filter to a table so that subsequent queries only return rows for which the filter predicate evaluates to true. A row filter is implemented as a SQL user-defined function (UDF).
To create a row filter, first write a SQL UDF to define the filter policy and then apply it to a table with an ALTER TABLE
statement. Alternatively, you can specify a row filter for a table in the initial CREATE TABLE
statement. Each table can have only one row filter. A row filter accepts zero or more input parameters where each input parameter binds to one column of the corresponding table.
What is the difference between these filters and dynamic views?
The dynamic view is an abstracted, read-only view of one or more source tables. The user can access the dynamic view without having access to the source tables directly. Creating a dynamic view defines a new table name that must not match the name of any source tables or any other tables or views present in the same schema.
On the other hand, associating a row filter or column mask to a target table applies the corresponding logic directly to the table itself without introducing any new table names. Subsequent queries may continue referring directly to the target table using its original name.
Both dynamic views and row filters and column masks let you apply complex logic to tables and process their filtering decisions at query runtime.
Use dynamic views if you need to apply transformation logic such as filters and masks to read-only tables, and if it is acceptable for users to refer to the dynamic views using different names. Use row filters and column masks if you want to filter or compute expressions over specific data but still provide users access to the tables using their original names.
Row Filter Syntax
To create a row filter and add it to an existing table, use the following syntax:
Create the row filter:
CREATE FUNCTION <function_name> (<parameter_name> <parameter_type>, ...)
RETURN {filter clause whose output must be a boolean};
Apply the row filter to a table:
ALTER TABLE <table_name> SET ROW FILTER <function_name> ON (<column_name>, ...);
Remove a row filter from a table:
ALTER TABLE <table_name> DROP ROW FILTER;
Modify a row filter:
Run a DROP FUNCTION statement to drop the existing function, or use CREATE OR REPLACE FUNCTION to replace it.
Delete a row filter:
ALTER TABLE <table_name> DROP ROW FILTER;
DROP FUNCTION <function_name>;
Note
You must perform the ALTER TABLE ... DROP ROW FILTER
command before dropping the function or the table will be in an inaccessible state.
If the table becomes inaccessible in this way, alter the table and drop the orphaned row filter reference using ALTER TABLE <table_name> DROP ROW FILTER;
.
Row filter examples
Create a SQL user defined function applied to members of the group admin
in the region US
.
With this function, members of the admin
group can access all records in the table. If the function is called by a non-admin, the RETURN_IF
condition fails and the region='US'
expression is evaluated, filtering the table to only show records in the US
region.
CREATE FUNCTION us_filter(region STRING)
RETURN IF(IS_ACCOUNT_GROUP_MEMBER('admin'), true, region='US');
Apply the function to a table as a row filter. Subsequent queries from the sales
table then return a subset of rows.
CREATE TABLE sales (region STRING, id INT) USING delta;
ALTER TABLE sales SET ROW FILTER us_filter ON (region);
Disable the row filter. Future user queries from the sales
table then return all of the rows in the table.
ALTER TABLE sales DROP ROW FILTER;
Create a table with the function applied as a row filter as part of the CREATE TABLE statement. Future queries from the sales
table then each return a subset of rows.
CREATE TABLE sales (region STRING, id INT) USING delta
WITH ROW FILTER us_filter ON (region);
What are column masks?
Column masks let you apply a masking function to a table column. The masking function gets evaluated at query runtime, substituting each reference of the target column with the results of the masking function. For most use cases, column masks determine whether to return the original column value or redact it based on the identity of the invoking user. Column masks are expressions written as SQL UDFs.
Each table column can optionally have one masking function applied to it. The masking function takes the unmasked value of the column as input and returns the masked value as its result. The return value of the masking function should be the same type as the column being masked. The masking function can also take additional columns as input parameters and use them in its masking logic.
To apply column masks, create a function and apply it to a table column using an ALTER TABLE
statement. Alternatively, you can apply the masking function when you create the table.
Column mask syntax
Within the MASK
clause, you can use any of the Databricks built-in runtime functions or call other user-defined functions. Common use cases include inspecting the identity of the invoking user running the function using current_user( )
or which groups they are a member of using is_account_group member( )
.
Create a column mask:
CREATE FUNCTION <function_name> (<parameter_name> <parameter_type>, ...)
RETURN {expression with the same type as the first parameter};
Apply column mask to a column in an existing table:
ALTER TABLE <table_name> ALTER COLUMN <col_name> SET MASK <mask_func_name> [USING COLUMNS <additional_columns>];
Remove a column mask from a column in a table:
ALTER TABLE <table_name> ALTER COLUMN <column where mask is applied> DROP MASK;
Modify a column mask:
Either DROP
the existing function, or use CREATE OR REPLACE TABLE
.
Delete a column mask:
ALTER TABLE <table_name> ALTER COLUMN <column where mask is applied> DROP MASK;
DROP FUNCTION <function_name>;
Note
You must perform the ALTER TABLE
command before dropping the function or the table will be in an inaccessible state.
If the table becomes inaccessible in this way, alter the table and drop the orphaned mask reference reference using ALTER TABLE <table_name> ALTER COLUMN <column where mask is applied> DROP MASK;
.
Column mask examples
Create a SQL user defined function. This function masks the ssn
column for members of the admin
group.
CREATE FUNCTION ssn_mask(ssn STRING)
RETURN IF(IS_ACCOUNT_GROUP_MEMBER('admin'), ssn, '****');
Apply the new function to a table as a column mask. Future users queries from the ssn
column then return altered values.
CREATE TABLE users
(region STRING, table_ssn STRING) USING delta;
ALTER TABLE users ALTER COLUMN table_ssn SET MASK ssn_mask;
Disable the column mask. Future users queries from the ssn
column then return original values.
ALTER TABLE users ALTER COLUMN ssn DROP MASK;
Create another table with the new function applied as a column mask in one step, as part of the CREATE TABLE statement. Future users queries from the ssn
column then return altered values.
CREATE TABLE region_to_ssn (
region STRING,
ssn STRING MASK ssn_mask)
USING delta;
Use mapping tables to create an access-control list
To achieve row-level security, consider defining a mapping table (or access-control list). Each mapping table is a comprehensive mapping table that encodes which data rows in the original table are accessible to certain users or groups. Mapping tables are useful because they offer simple integration with your fact tables through direct joins.
This methodology proves beneficial in addressing many use cases with custom requirements. Examples include:
Imposing restrictions based on the logged-in user while accommodating different rules for specific user groups.
Creating intricate hierarchies, such as organizational structures, requiring diverse sets of rules.
Replicating complex security models from external source systems.
By adopting mapping tables in this way, you can effectively tackle these challenging scenarios and ensure robust row-level and column-level security implementations.
Mapping table examples
Use a mapping table to check if the current user is in a list:
USE CATALOG main;
Create a new mapping table:
DROP TABLE IF EXISTS valid_users;
CREATE TABLE valid_users(username string) USING delta;
INSERT INTO valid_users
VALUES
('fred@databricks.com'),
('barney@databricks.com');
Create a new filter:
Note
All filters run with definer’s rights except for functions that check user context (for example, the CURRENT_USER
and IS_MEMBER
functions) which run as the invoker.
In this example the function checks to see if the current user is in the valid_users
table. If the user is found, the function returns true.
DROP FUNCTION IF EXISTS row_filter;
CREATE FUNCTION row_filter()
RETURN EXISTS(
SELECT 1 FROM valid_users v
WHERE v.username = CURRENT_USER()
);
The example below applies the row filter during table creation. You can also add the filter later using an ALTER TABLE
statement. When applying to a whole table use the ON ()
syntax. For a specific row use ON (row);
.
DROP TABLE IF EXISTS data_table;
CREATE TABLE data_table
(x INT, y INT, z INT)
USING delta
WITH ROW FILTER row_filter ON ();
INSERT INTO data_table VALUES
(1, 2, 3),
(4, 5, 6),
(7, 8, 9);
Select data from the table. This should only return data if the user is in the valid_users
table.
SELECT * FROM data_table;
Create a mapping table comprising accounts that should always have access to view all the rows in the table, regardless of the column values:
CREATE TABLE valid_accounts(account string) USING delta;
INSERT INTO valid_accounts
VALUES
('admin'),
('cstaff');
Now create a SQL UDF that returns true
if the values of all columns in the row are less than five, or if the invoking user is a member of the above mapping table.
CREATE FUNCTION row_filter_small_values (x INT, y INT, z INT)
RETURN (x < 5 AND y < 5 AND z < 5)
OR EXISTS(
SELECT 1 FROM valid_accounts v
WHERE IS_ACCOUNT_GROUP_MEMBER(v.account));
Finally, apply the SQL UDF to the table as a row filter:
ALTER TABLE data_table SET ROW FITLER row_filter_small_values ON (x, y, z);
Supportability
Databricks SQL and Databricks notebooks for SQL workloads are supported.
Databricks Machine Learning commands by users with MODIFY privileges are supported. Filters and masks are applied to the data read by UPDATEs and DELETEs, and are not applied to data that is written (including INSERTed data).
Supported formats: Delta and Parquet. Parquet is supported for only managed or external Tables.
Views on tables with column masks or row filters are supported
Delta Lake change data feeds are supported as long as the schema is compatible with the row filters and column masks that may apply to the target table.
Foreign tables are supported.
Limitations
Databricks Runtime versions before 12.2 LTS do not support row filters or column masks. These runtimes fail securely, meaning if you try to access tables from unsupported versions of these runtimes, no data is returned.
Delta Live Tables materialized views and streaming tables don’t support row filters or column masks.
Python or Scala UDFs are not supported as row filter or column mask functions directly. However, it is possible to refer to these in SQL UDFs as long as their definitions are permanently stored in the catalog (in other words, not temporary to the session).
Delta Sharing does not work with row-level security or column masks.
Time travel does not work with row-level security or column masks.
Table sampling does not work with row-level security or column masks.
Path-based access to files in tables with policies are not currently supported.
Row-filter or column-mask policies with circular dependencies back to the original policies are not supported.
MERGE
and shallow clones are not supported.
Single user clusters limitation
Do not add row filters or column masks to any table that you are accessing from single user clusters. This is commonly done in the context of Databricks Jobs. During the public preview, you will be unable to access the table from a single user cluster once a filter or mask has been applied.