Credential redaction
Databricks redacts keys and credentials in audit logs and log4j Apache Spark logs to
protect your data from information leaking. Databricks redacts three
types of credentials at logging time: AWS access key, AWS secret access Key, and
credentials in URI. Upon detection of these secrets, Databricks replaces
them with placeholders. For some credential types, Databricks also appends a hash_prefix
,
which is the first 8 hex bytes of the md5 checksum of the
credential for verification purpose.
AWS access key redaction
For AWS access keys, Databricks searches for strings starting with AKIA
and replace them with REDACTED_AWS_ACCESS_KEY(hash_prefix)
. For example, Databricks logs 2017/02/08: Accessing AWS using AKIADEADBEEFDEADBEEF
as 2017/01/08: Accessing AWS using REDACTED_AWS_ACCESS_KEY(655f9d2f)
AWS secret access key redaction
Databricks replaces a AWS secret access key with REDACTED_POSSIBLE_AWS_SECRET_ACCESS_KEY
without appending
its hash. For example, Databricks logs 2017/01/08: Accessing AWS using 99Abcdeuw+zXXAxllliupwqqqzDEUFdAtaBrickX
as 2017/01/08: Accessing AWS using REDACTED_POSSIBLE_AWS_SECRET_ACCESS_KEY
.
Since AWS does not have an explicit identifier for secret access keys, it’s possible that Databricks redacts some seemingly randomly-generated 40-characters long strings other than AWS secret access keys.
Credentials in URI redaction
Databricks detects //username:password@mycompany.com
in URI and replaces username:password
with
REDACTED_CREDENTIALS(hash_prefix)
. Databricks computes the hash from username:password
(including the :
). For example, Databricks logs 2017/01/08: Accessing https://admin:admin@mycompany.com
as 2017/01/08: Accessing https://REDACTED_CREDENTIALS(d2abaa37)@mycompany.com
.