Skip to main content

mask

Masks the given string value. This can be useful for creating copies of tables with sensitive information removed.

For the corresponding Databricks SQL function, see mask function.

Syntax

Python
from pyspark.databricks.sql import functions as dbf

dbf.mask(col=<col>, upperChar=<upperChar>, lowerChar=<lowerChar>, digitChar=<digitChar>, otherChar=<otherChar>)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

target column to compute on.

upperChar

pyspark.sql.Column or str, optional

character to replace upper-case characters with. Specify NULL to retain original character.

lowerChar

pyspark.sql.Column or str, optional

character to replace lower-case characters with. Specify NULL to retain original character.

digitChar

pyspark.sql.Column or str, optional

character to replace digit characters with. Specify NULL to retain original character.

otherChar

pyspark.sql.Column or str, optional

character to replace all other characters with. Specify NULL to retain original character.

Returns

pyspark.sql.Column:

Examples

Python
df = spark.createDataFrame([("AbCD123-@$#",), ("abcd-EFGH-8765-4321",)], ['data'])
df.select(mask(df.data).alias('r')).collect()
df.select(mask(df.data, lit('Y')).alias('r')).collect()
df.select(mask(df.data, lit('Y'), lit('y')).alias('r')).collect()
df.select(mask(df.data, lit('Y'), lit('y'), lit('d')).alias('r')).collect()
df.select(mask(df.data, lit('Y'), lit('y'), lit('d'), lit('*')).alias('r')).collect()