Configure audit log delivery

This article explains how to configure low-latency delivery of audit logs in JSON file format to an Amazon S3 storage bucket.

When your audit logs gets delivered to an S3 storage bucket, you can make the data available for usage analysis. Databricks delivers a separate JSON file for each workspace in your account and a separate file for account-level events. For more information on the file schema and audit events, see Audit log reference.

You can optionally deliver logs to an AWS account other than the account used for the IAM role that you create for log delivery. This allows flexibility, for example setting up workspaces from multiple AWS accounts to deliver to the same S3 bucket. This option requires that you configure an S3 bucket policy that references a cross-account IAM role. Instructions and a policy template are provided for you in Step 3: cross-account support.

In addition to delivery of logs for running workspaces, logs are delivered for cancelled workspaces to ensure that logs are properly delivered that represent the final day of the workspace.

Requirements

To configure audit log delivery, you must:

High-level flow

This section describes the high-level flow of audit log delivery.

  • Step 1: Configure storage: In AWS, create a new S3 bucket. Using Databricks APIs, call the Account API to create a storage configuration object that uses the bucket name.

  • Step 2: Configure credentials: In AWS, create the appropriate AWS IAM role. Using Databricks APIs, call the Account API to create a credentials configuration object that uses the IAM role’s ARN.

  • (Optional) Step 3: cross-account support: To deliver logs to an AWS account other than the account of the IAM role that you create for log delivery, add an S3 bucket policy. This policy references IDs for the cross-account IAM role that you created in the previous step.

  • Step 4: Call the log delivery API: Call the Account API to create a log delivery configuration that uses the credential and storage configuration objects from previous steps.

After you complete these steps, you can access the JSON files. The delivery location is in the following format:

<bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json

Note

If you configure audit log delivery for the entire account, account-level audit events that are not associated with any single workspace are delivered to the workspaceId=0 partition.

Considerations based on your number of workspaces

Your delivery configuration might vary depending on how many workspaces you have and where they are located:

  • If you have one workspace in your Databricks account: Follow the instructions as outlined in the high-level flow, creating a single configuration object for your workspace.

  • If you have multiple workspaces in the same Databricks account: Do one of the following:

    • Share the same configuration (log delivery S3 bucket and IAM role) for all workspaces in the account. This is the only configuration option that also delivers account-level audit logs. It is the default option.

    • Use separate configurations for each workspace in the account.

    • Use separate configurations for different groups of workspaces, each sharing a configuration.

  • If you have multiple workspaces, each associated with a separate Databricks account: Create unique storage and credential configuration objects for each account. You can reuse an S3 bucket or IAM role between these configuration objects.

Note

You can configure log delivery with the Account API even if the workspace wasn’t created using the Account API.

How to authenticate to the Account API

To authenticate to the Account API, you can use Databricks OAuth for service principals, Databricks OAuth for users, or a Databricks account admin’s username and password. Databricks strongly recommends that you use Databricks OAuth for users or service principals. A service principal is an identity that you create in Databricks for use with automated tools, jobs, and applications. See OAuth machine-to-machine (M2M) authentication.

Use the following examples to authenticate to a Databricks account. You can use OAuth for service principals, OAuth for users, or a user’s username and password (legacy). For background, see:

For authentication examples, choose from the following:

  1. Install Databricks CLI version 0.205 or above. See Install or update the Databricks CLI.

  2. Complete the steps to configure OAuth M2M authentication for service principals in the account. See OAuth machine-to-machine (M2M) authentication.

  3. Identify or manually create a Databricks configuration profile in your .databrickscfg file, with the profile’s fields set correctly for the related host, account_id, and client_id and client_secret mapping to the service principal. See OAuth machine-to-machine (M2M) authentication.

  4. Run your target Databricks CLI command, where <profile-name> represents the name of the configuration profile in your .databrickscfg file:

    databricks account <command-name> <subcommand-name> -p <profile-name>
    

    For example, to list all users in the account:

    databricks account users list -p MY-AWS-ACCOUNT
    
    • For a list of available account commands, run the command databricks account -h.

    • For a list of available subcommands for an account command, run the command databricks account <command-name> -h.

  1. Install Databricks CLI version 0.205 or above. See Install or update the Databricks CLI.

  2. Complete the steps to configure OAuth U2M authentication for users in the account. See OAuth user-to-machine (U2M) authentication.

  3. Start the user authentication process by running the following Databricks CLI command:

    databricks auth login --host <account-console-url> --account-id <account-id>
    

    For example:

    databricks auth login --host https://accounts.cloud.databricks.com --account-id 00000000-0000-0000-0000-000000000000
    

    Note

    If you have an existing Databricks configuration profile with the host and account_id fields already set, you can substitute --host <account-console-url> --account-id <account-id> with --profile <profile-name>.

  4. Follow the on-screen instructions to have the Databricks CLI automatically create the related Databricks configuration profile in your .databrickscfg file.

  5. Continue following the on-screen instructions to sign in to your Databricks account through your web browser.

  6. Run your target Databricks CLI command, where <profile-name> represents the name of the configuration profile in your .databrickscfg file:

    databricks account <command-name> <subcommand-name> -p <profile-name>
    

    For example, to list all users in the account:

    databricks account users list -p ACCOUNT-00000000-0000-0000-0000-000000000000
    
    • For a list of available account commands, run the command databricks account -h.

    • For a list of available subcommands for an account command, run the command databricks account <command-name> -h.

  1. Install Databricks CLI version 0.205 or above. See Install or update the Databricks CLI.

  2. Identify or manually create a Databricks configuration profile in your .databrickscfg file, with the profile’s fields set correctly for the related host, account_id, and username and password mapping to your Databricks user account. See Basic authentication (legacy).

  3. Run your target Databricks CLI command, where <profile-name> represents the name of the configuration profile in your .databrickscfg file:

    databricks account <command-name> <subcommand-name> -p <profile-name>
    

    For example, to list all users in the account:

    databricks account users list -p MY-AWS-ACCOUNT
    
    • For a list of available account commands, run the command databricks account -h.

    • For a list of available subcommands for an account command, run the command databricks account <command-name> -h.

Audit delivery details

After logging is enabled for your account, Databricks automatically sends audit logs in human-readable format to your delivery location on a periodic basis.

  • Latency: After initial setup or other configuration changes, expect some delay before your changes take effect. For initial setup of audit log delivery, it takes up to one hour for log delivery to begin. After log delivery begins, auditable events are typically logged within 15 minutes. Additional configuration changes typically take an hour to take effect.

  • Encryption: Databricks encrypts audit logs using Amazon S3 server-side encryption.

  • Format: Databricks delivers audit logs in JSON format.

  • Location: The delivery location is <bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json. New JSON files are delivered every few minutes, potentially overwriting existing files. The delivery path is defined as part of the configuration. Account-level audit events that are not associated with any single workspace are delivered to the workspaceId=0 partition, if you configured audit logs delivery for the entire account.

    • Databricks can overwrite the delivered log files in your bucket at any time. If a file is overwritten, the existing content remains, but there might be additional lines for more auditable events.

    • Overwriting ensures exactly-once semantics without requiring read or delete access to your account.

Use the log delivery APIs

The log delivery APIs have the following additional features:

Log delivery configuration status can be found in the API response’s log_delivery_status object. With log_delivery_status, you can check the status (success or failure) and the last time of an attempt or successful delivery.

Audit log delivery limitations

There is a limit on the number of log delivery configurations available per account (each limit applies separately to each log type including billable usage and audit logs). You can create a maximum of two enabled account-level delivery configurations (configurations without a workspace filter) per type. Additionally, you can create and enable two workspace level delivery configurations per workspace for each log type, meaning the same workspace ID can occur in the workspace filter for no more than two delivery configurations per log type.

You cannot delete a log delivery configuration, but you can disable it. You can re-enable a disabled configuration, but the request fails if it violates the limits previously described.

Enable verbose audit logs

In addition to the default events, you can configure a workspace to generate additional events by enabling verbose audit logs.

To enable or disable verbose audit logs, do the following:

  1. As a workspace admin, go to the Databricks admin settings page.

  2. Click the Advanced tab.

  3. Next to Verbose Audit Logs, enable or disable the feature.

When you enable or disable verbose logging, an auditable event is emitted in the category workspace with action workspaceConfKeys. The workspaceConfKeys request parameter is enableVerboseAuditLogs. The request parameter workspaceConfValues is true (feature enabled) or false (feature disabled).

Additional verbose audit logs

When you configure verbose audit logs, your logs include the following additional events:

Service

Action name

Description

Request parameters

notebook

runCommand

Emitted after an interactive user runs a command in a notebook. A command corresponds to a cell in a notebook.

  • notebookId

  • executionTime

  • status

  • commandId

  • commandText

jobs

runCommand

Emitted after a command in a notebook is executed by a job run. A command corresponds to a cell in a notebook.

  • jobId

  • runId

  • notebookId

  • executionTime

  • status

  • commandId

  • commandText

databrickssql

commandSubmit

Runs when a command is submitted to Databricks SQL.

  • commandText

  • warehouseId

  • commandId

databrickssql

commandFinish

Runs when a command completes or a command is cancelled.

  • warehouseId

  • commandId

Check the response field for additional information related to the command result:

  • statusCode - The HTTP response code. This will be error 400 if it is a general error.

  • errorMessage - Error message.

    Note

    In some cases for certain long-running commands, the errorMessage field might not be populated on failure.

  • result: This field is empty.