Audit log schemas for security monitoring

Important

This article has been retired and might not be updated. Security monitoring logs are now documented in Additional security monitoring events.

For Databricks compute resources in the classic compute plane, such as VMs for clusters and pro or classic SQL warehouses, the following features enable additional monitoring agents:

For serverless compute resources, the monitoring agents run if the compliance security profile is enabled and the complaince standard supports serverless compute resources. See Which compute resources get enhanced security and Compliance security profile compliance standards with serverless compute availability.

The output for the monitoring tools for these features are available within Databricks audit logs.

To access logs:

  1. As an admin, set up audit log delivery to your own Amazon S3 bucket.

  2. Regularly review the logs for new rows based on the following sections.

Schema for file integrity monitoring

For the overall schema of audit logs, see Audit log example schema.

Audit log fields that are important for file integrity monitoring:

  • serviceName: Always capsule8-alerts-dataplane.

  • timestamp: Time that the tool created this event.

  • workspaceId: The workspace associated with this event.

  • requestId: The unique UUID for the original event.

  • requestParams: This request parameters JSON object always has only one instanceId field, which is the instance-id (the ID of the host) that emitted this audit log entry.

  • response: A JSON object.

    • This always has a statusCode property set to 200.

    • A result field that contains the original JSON value of the event. The JSON can vary based on the detection that triggers it. For the complete schema reference, see the following third-party documentation article on the alert JSON schema. The JSON value is encoded as a single string object instead of a nested JSON value, so expect escaped quote signs like the following:

      "response": {
        "statusCode": 200,
        "result": "{\"actionName\": \"Wget Program Blacklist\"}"
      }
      
  • accountId: The Databricks account ID for this workspace.

  • auditLevel: Always WORKSPACE_LEVEL.

  • actionName: Action name. One of the following values:

    • Heartbeat: A regular event to confirm the monitor is on. Currently runs every 10 minutes, but this might change in the future.

    • Memory Marked Executable: Memory is often marked executable in order to allow malicious code to execute when an application is being exploited. Alerts when a program sets heap or stack memory permissions to executable. This can cause false positives for certain application servers.

    • File Integrity Monitor: Monitors the integrity of important system files. Alerts on any unauthorized changes to those files. Databricks defines specific sets of system paths on the image, and this set of paths may change over time.

    • Systemd Unit File Modified: Changes to systemd units could result in security controls being relaxed or disabled, or the installation of a malicious service. Alerts whenever a systemd unit file is modified by a program other than systemctl.

    • Repeated Program Crashes: Repeated program crashes could indicate that an attacker is attempting to exploit a memory corruption vulnerability, or that there is a stability issue in the affected application. Alerts when more than 5 instances of an individual program crash via segmentation fault.

    • Userfaultfd Usage: Certain Linux functionality is almost exclusively used when exploiting kernel vulnerabilities, usually with the goal of privilege escalation. Alerts when a binary executes the userfaultfd system call.

    • New File Executed in Container: As containers are typically static workloads, this alert could indicate that an attacker has compromised the container and is attempting to install and run a backdoor. Alerts when a file that has been created or modified within 30 minutes is then executed within a container.

    • Suspicious Interactive Shell: Interactive shells are rare occurrences on modern production infrastructure. Alerts when an interactive shell is started with arguments commonly used for reverse shells.

    • User Command Logging Evasion: Evading command logging is common practice for attackers, but may also indicate that a legitimate user is performing unauthorized actions or trying to evade policy. Alerts when a change to user command history logging is detected, indicating that a user is attempting to evade command logging.

    • BPF Program Executed: Detects some types of kernel backdoors. The loading of a new Berkeley Packet Filter (BPF) program could indicate that an attacker is loading a BPF-based rootkit to gain persistence and avoid detection. Alerts when a process loads a new privileged BPF program, if the process that is already part of an ongoing incident.

    • Kernel Module Loaded: Attackers commonly load malicious kernel modules (rootkits) to evade detection and maintain persistence on a compromised node. Alerts when a kernel module is loaded, if the program is already part of an ongoing incident.

    • Suspicious Program Name Executed-Space After File: Attackers may create or rename malicious binaries to include a space at the end of the name in an effort to impersonate a legitimate system program or service. Alerts when a program is executed with a space after the program name.

    • Illegal Elevation Of Privileges: Kernel privilege escalation exploits commonly enable an unprivileged user to gain root privileges without passing standard gates for privilege changes. Alerts when a program attempts to elevate privileges through unusual means. This can issue false positive alerts on nodes with significant workloads.

    • Kernel Exploit: Internal kernel functions are not accessible to regular programs, and if called, are a strong indicator that a kernel exploit has executed and that the attacker has full control of the node. Alerts when a kernel function unexpectedly returns to user space.

    • Processor-Level Protections Disabled: SMEP and SMAP are processor-level protections that increase difficulty for kernel exploits to succeed, and disabling these restrictions is a common early step in kernel exploits. Alerts when a program tampers with the kernel SMEP/SMAP configuration.

    • Container Escape via Kernel Exploitation: Alerts when a program uses kernel functions commonly used in container escape exploits, indicating that an attacker is escalating privileges from container-access to node-access.

    • Privileged Container Launched: Privileged containers have direct access to host resources, leading to a greater impact when compromised. Alerts when a privileged container is launched, if the container isn’t a known privileged image such as kube-proxy. This can issue unwanted alerts for legitimate privileged containers.

    • Userland Container Escape: Many container escapes coerce the host to execute an in-container binary, resulting in the attacker gaining full control of the affected node. Alerts when a container-created file is executed from outside a container.

    • AppArmor Disabled In Kernel: Modification of certain AppArmor attributes can only occur in-kernel, indicating that AppArmor has been disabled by a kernel exploit or rootkit. Alerts when the AppArmor state is changed from the AppArmor configuration detected when the sensor starts.

    • AppArmor Profile Modified: Attackers may attempt to disable enforcement of AppArmor profiles as part of evading detection. Alerts when a command for modifying an AppArmor profile is executed, if it was not executed by a user in an SSH session.

    • Boot Files Modified: If not performed by a trusted source (such as a package manager or configuration management tool), modification of boot files could indicate an attacker modifying the kernel or its options in order to gain persistent access to a host. Alerts when changes are made to files in /boot, indicating installation of a new kernel or boot configuration.

    • Log Files Deleted: Log deletion not performed by a log management tool could indicate that an attacker is trying to remove indicators of compromise. Alerts on deletion of system log files.

    • New File Executed: Newly created files from sources other than system update programs may be backdoors, kernel exploits, or part of an exploitation chain. Alerts when a file that has been created or modified within 30 minutes is then executed, excluding files created by system update programs.

    • Root Certificate Store Modified: Modification of the root certificate store could indicate the installation of a rogue certificate authority, enabling interception of network traffic or bypass of code signature verification. Alerts when a system CA certificate store is changed.

    • Setuid/Setgid Bit Set On File: Setting setuid/setgid bits can be used to provide a persistent method for privilege escalation on a node. Alerts when the setuid or setgid bit is set on a file with the chmod family of system calls.

    • Hidden File Created: Attackers often create hidden files as a means of obscuring tools and payloads on a compromised host. Alerts when a hidden file is created by a process associated with an ongoing incident.

    • Modification Of Common System Utilities: Attackers may modify system utilities in order to execute malicious payloads whenever these utilities are run. Alerts when a common system utility is modified by an unauthorized process.

    • Network Service Scanner Executed: An attacker or rogue user may use or install these programs to survey connected networks for additional nodes to compromise. Alerts when common network scanning program tools are executed.

    • Network Service Created: Attackers may start a new network service to provide easy access to a host after compromise. Alerts when a program starts a new network service, if the program is already part of an ongoing incident.

    • Network Sniffing Program Executed: An attacker or rogue user may execute network sniffing commands to capture credentials, personally-identifiable information (PII), or other sensitive information. Alerts when a program is executed that allows network capture.

    • Remote File Copy Detected: Use of file transfer tools could indicate that an attacker is attempting to move toolsets to additional hosts or exfiltrate data to a remote system. Alerts when a program associated with remote file copying is executed, if the program is already part of an ongoing incident.

    • Unusual Outbound Connection Detected: Command and Control channels and cryptocoin miners often create new outbound network connections on unusual ports. Alerts when a program initiates a new connection on an uncommon port, if the program is already part of an ongoing incident.

    • Data Archived Via Program: After gaining access to a system, an attacker may create a compressed archive of files to reduce the size of data for exfiltration. Alerts when a data compression program is executed, if the program is already part of an ongoing incident.

    • Process Injection: Use of process injection techniques commonly indicates that a user is debugging a program, but may also indicate that an attacker is reading secrets from or injecting code into other processes. Alerts when a program uses ptrace (debugging) mechanisms to interact with another process.

    • Account Enumeration Via Program: Attackers will often use account enumeration programs to determine their level of access and to see if other users are currently logged in to the node. Alerts when a program associated with account enumeration is executed, if the program is already part of an ongoing incident.

    • File and Directory Discovery Via Program: Exploring file systems is common post-exploitation behavior for an attacker looking for credentials and data of interest. Alerts when a program associated with file and directory enumeration is executed, if the program is already part of an ongoing incident.

    • Network Configuration Enumeration Via Program: Attackers can interrogate local network and route information to identify adjacent hosts and networks ahead of lateral movement. Alerts when a program associated with network configuration enumeration is executed, if the program is already part of an ongoing incident.

    • Process Enumeration Via Program: Attackers often list running programs in order to identify the purpose of a node and whether any security or monitoring tools are in place. Alerts when a program associated with process enumeration is executed, if the program is already part of an ongoing incident.

    • System Information Enumeration Via Program: Attackers commonly execute system enumeration commands to determine Linux kernel and distribution versions and features, often to identify if the node is affected by specific vulnerabilities. Alerts when a program associated with system information enumeration is executed, if the program is already part of an ongoing incident.

    • Scheduled Tasks Modified Via Program: Modifying scheduled tasks is a common method for establishing persistence on a compromised node. Alerts when the crontab, at, or batch commands are used to modify scheduled task configurations.

    • Systemctl Usage Detected: Changes to systemd units could result in security controls being relaxed or disabled, or the installation of a malicious service. Alerts when the systemctl command is used to modify systemd units.

    • User Execution Of su Command: Explicit escalation to the root user decreases the ability to correlate privileged activity to a specific user. Alerts when the su command is executed.

    • User Execution Of sudo Command: Alerts when the sudo command is executed.

    • User Command History Cleared: Deleting the history file is unusual, commonly performed by attackers hiding activity, or by legitimate users intending to evade audit controls. Alerts when command line history files are deleted.

    • New System User Added: An attacker may add a new user to a host to provide a reliable method of access. Alerts if a new user entity is added to the local account management file /etc/passwd, if the entity is not added by a system update program.

    • Password Database Modification: Attackers may directly modify identity-related files to add a new user to the system. Alerts when a file related to user passwords is modified by a program unrelated to updating existing user information.

    • SSH Authorized Keys Modification: Adding a new SSH public key is a common method for gaining persistent access to a compromised host. Alerts when an attempt to write to a user’s SSH authorized_keys file is observed, if the program is already part of an ongoing incident.

    • User Account Created Via CLI: Adding a new user is a common step for attackers when establishing persistence on a compromised node. Alerts when an identity management program is executed by a program other than a package manager.

    • User Configuration Changes: User profile and configuration files are often modified as a method of persistence in order to execute a program whenever a user logs in. Alerts when .bash_profile and bashrc (as well as related files) are modified by a program other than a system update tool.

The following is an example file integrity monitoring audit log entry:

{
  "version": "2.0",
  "timestamp": 1625959170109,
  "workspaceId": "2417130538620110",
  "serviceName": "capsule8-alerts-dataplane",
  "actionName": "Wget Program Blacklist",
  "requestId": "318a87db-4cfe-4532-9110-09edc262275e",
  "requestParams": {
    "instanceId": "i-0a3c9d63bb295eb4f"
  },
  "response": {
    "statusCode": 200,
    "result": "<original-alert-json>"
  },
  "accountId": "82d65820-b5e4-4ab0-96e6-0cba825a5687",
  "auditLevel": "WORKSPACE_LEVEL"
}

Schema for antivirus monitoring

For the overall schema of audit logs, see Audit log example schema.

Audit log fields that are important for antivirus monitoring:

  • serviceName: Always clamAVScanService-dataplane.

  • actionName: Always clamAVScanAction.

  • timestamp: The time when the tool generates this log row.

  • workspaceId: The workspace ID associated with this log.

  • requestId: The unique UUID for the original scanning event.

  • requestParams: This request parameters JSON object always has only one instanceId field, which is the instance-id (the ID of the host) that emitted this audit log entry.

  • response: This is response JSON object always has a statusCode of 200 and a result field that includes one line of the original scan result. Each scan result is represented typically by multiple audit log records, one for each line of the original scan output. For details of what could appear in this file, see the following third-party documentation.

  • accountId: The Databricks account ID associated with this log.

  • auditLevel: Always WORKSPACE_LEVEL.

The following is an example antivirus audit log entry that shows the beginning of a scan in the response.result field:

{
  "version": "2.0",
  "timestamp": 1625959170109,
  "workspaceId": "2417130538620110",
  "serviceName": "clamAVScanService-dataplane",
  "actionName": "clamAVScanAction",
  "requestId": "318a87db-4cfe-4532-9110-09edc262275e",
  "requestParams": {
    "instanceId": "i-0a3c9d63bb295eb4f"
  },
  "response": {
    "statusCode": 200,
    "result": "begin daily clamav scan : Mon Oct 25 06:25:01 UTC 2021\\n"
  },
  "accountId": "82d65820-b5e4-4ab0-96e6-0cba825a5687",
  "auditLevel": "WORKSPACE_LEVEL"
}

An example antivirus log file:

----------- SCAN SUMMARY -----------
Known viruses: 8556227
Engine version: 0.103.2
Scanned directories: 6
Scanned files: 446
Infected files: 0
Data scanned: 74.50 MB
Data read: 164.43 MB (ratio 0.45:1)
Time: 37.874 sec (0 m 37 s)
Start Date: 2021:07:27 21:47:36
End Date:   2021:07:27 21:48:14

Schema for the system logs

For the overall schema of audit logs, see Audit log example schema.

Audit log fields that are important for the system log:

  • serviceName: Always syslog.

  • actionName: Always processEvent.

  • timestamp: The time when the system log generates this log row.

  • workspaceId: The workspace ID associated with this log.

  • requestId: The unique UUID for the original system log event.

  • requestParams: This request parameters JSON object has the following keys:

    • instanceId: The instance-id (the ID of the host) that emitted this audit log entry.

    • processName: Name of the internal process that generated this event. This field is intended for advanced diagnostics and its contents are subject to change.

  • response: A JSON object with a statusCode of 200 and a result field that includes the original system log content.

  • accountId: The Databricks account ID associated with this log.

  • auditLevel: Always WORKSPACE_LEVEL.

An example event for the system log:

{
    "version":"2.0",
    "timestamp":1633220481000,
    "workspaceId":"2417130538620110",
    "sessionId":"2710",
    "serviceName":"syslog",
    "actionName":"processEvent",
    "requestId":"1054f4c8-741d-3d80-b168-ca2cb891aa7a",
    "requestParams":{
        "instanceId": "i-00edf5b73b4c68221",
        "processName": "<process-name>"
    },
    "response":{
        "statusCode":200,
        "result":"<syslog content>"
    },
    "accountId":"82d65820-b5e4-4ab0-96e6-0cba825a5687",
    "auditLevel":"WORKSPACE_LEVEL"
}

Schema for the process monitor

For the overall schema of audit logs, see Audit log example schema.

Audit log fields that are important for the process monitor log:

  • serviceName: Always monit.

  • actionName: One of the following: processNotRunning (the monitor is running), processRestarting (the monitor is restarting), processStarted (the monitor started), or processRunning (the monitor is running).

  • timestamp: The time when the monitor log generates this log row.

  • workspaceId: The workspace ID associated with this log.

  • requestId: The unique UUID for the original system log event.

  • requestParams: This request parameters JSON object has the following keys:

    • instanceId: The instance-id (the ID of the host) that emitted this audit log entry.

    • processName: Name of the internal process that is being monitored. This field is intended for advanced diagnostics and its contents are subject to change.

  • response: A JSON object with a statusCode of 200.

  • accountId: The Databricks account ID associated with this log.

  • auditLevel: Always WORKSPACE_LEVEL.

An example event for the process monitor log:

{
    "version":"2.0",
    "timestamp":1626857554000,
    "workspaceId":"2417130538620110",
    "serviceName":"monit",
    "actionName":"processRestarting",
    "requestId":"48bb4060-7685-3a19-9dbb-f83d2afaf346",
    "requestParams":{
        "instanceId":"i-0c48619b79d4056f2",
        "processName":"<process-name>"
    },
    "response":{
        "statusCode":200
    },
    "accountId":"82d65820-b5e4-4ab0-96e6-0cba825a5687",
    "auditLevel":"WORKSPACE_LEVEL"
}