Configure audit log delivery

This article explains how to configure low-latency delivery of audit logs in JSON file format to an Amazon S3 storage bucket.

When your audit logs gets delivered to an S3 storage bucket, you can make the data available for usage analysis. Databricks delivers a separate JSON file for each workspace in your account and a separate file for account-level events. For information on the file schema and audit events, see Audit log reference.

You can optionally deliver logs to an AWS account other than the account used for the IAM role that you create for log delivery. This allows flexibility, for example setting up workspaces from multiple AWS accounts to deliver to the same S3 bucket. This option requires that you configure an S3 bucket policy that references a cross-account IAM role. Instructions and a policy template are provided for you in Step 3: cross-account support.

Requirements

To configure audit log delivery, you must:

  • Be an account admin with an email address and password to authenticate with the APIs. The email address and password are both case sensitive.

  • Authenticate to the APIs so you can set up delivery with the Account API. See How to authenticate to the Account API.

High-level flow

This section describes the high-level flow of audit log delivery.

  • Step 1: Configure storage: In AWS, create a new S3 bucket. Using Databricks APIs, call the Account API to create a storage configuration object that uses the bucket name.

  • Step 2: Configure credentials: In AWS, create the appropriate AWS IAM role. Using Databricks APIs, call the Account API to create a credentials configuration object that uses the IAM role’s ARN.

  • (Optional) Step 3: cross-account support: To deliver logs to an AWS account other than the account of the IAM role that you create for log delivery, add an S3 bucket policy. This policy references IDs for the cross-account IAM role that you created in the previous step.

  • Step 4: Call the log delivery API: Call the Account API to create a log delivery configuration that uses the credential and storage configuration objects from previous steps.

After you complete these steps, you can access the JSON files. The delivery location is in the following format:

<bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json

Note

If you configure audit log delivery for the entire account, account-level audit events that are not associated with any single workspace are delivered to the workspaceId=0 partition.

Considerations based on your number of workspaces

Your delivery configuration might vary depending on how many workspaces you have and where they are located:

  • If you have one workspace in your Databricks account: Follow the instructions as outlined in the high-level flow, creating a single configuration object for your workspace.

  • If you have multiple workspaces in the same Databricks account: Do one of the following:

    • Share the same configuration (log delivery S3 bucket and IAM role) for all workspaces in the account. This is the only configuration option that also delivers account-level audit logs. It is the default option.

    • Use separate configurations for each workspace in the account.

    • Use separate configurations for different groups of workspaces, each sharing a configuration.

  • If you have multiple workspaces, each associated with a separate Databricks account: Create unique storage and credential configuration objects for each account. You can reuse an S3 bucket or IAM role between these configuration objects.

Note

You can configure log delivery with the Account API even if the workspace wasn’t created using the Account API.

How to authenticate to the Account API

The Account API is published on the accounts.cloud.databricks.com base endpoint for all AWS regional deployments.

Use the following base URL for API requests: https://accounts.cloud.databricks.com/api/2.0/.

Preview

OAuth for service principals is in public preview.

To authenticate to the Account API, you can use Databricks OAuth tokens for service principals or an account admin’s username and password. Databricks strongly recommends that you use OAuth tokens for service principals. A service principal is an identity that you create in Databricks for use with automated tools, jobs, and applications. To create an OAuth token, see Authentication using OAuth tokens for service principals.

Use the following examples to authenticate to the Account API:

Pass the OAuth token in the header using Bearer authentication. For example:

export OAUTH_TOKEN=<oauth-access-token>

curl -X GET --header "Authorization: Bearer $OAUTH_TOKEN" \
'https://accounts.cloud.databricks.com/api/2.0/accounts/<accountId>/<endpoint>'

In this section, username refers to an account admin’s email address. There are several ways to provide your credentials to tools such as curl.

  • Pass your username and account password separately in the headers of each request in <username>:<password> syntax.

    curl -X GET -u <username>:<password> -H "Content-Type: application/json" \
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<accountId>/<endpoint>'
    
  • Apply base64 encoding to your <username>:<password> string and provide it directly in the HTTP header:

    curl -X GET -H "Content-Type: application/json" \
      -H 'Authorization: Basic <base64-username-pw>'
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<accountId>/<endpoint>'
    
  • Create a .netrc file with machine, login, and password properties:

    machine accounts.cloud.databricks.com
    login <username>
    password <password>
    

    To invoke the .netrc file, use -n in your curl command:

    curl -n -X GET 'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/workspaces'
    

This article’s examples use OAuth for service principals for authentication. For the complete API reference, see Databricks REST API reference.

Audit delivery details

After logging is enabled for your account, Databricks automatically sends audit logs in human-readable format to your delivery location on a periodic basis.

  • Latency: After initial setup or other configuration changes, expect some delay before your changes take effect. For initial setup of audit log delivery, it takes up to one hour for log delivery to begin. After log delivery begins, auditable events are typically logged within 15 minutes. Additional configuration changes typically take an hour to take effect.

  • Encryption: Databricks encrypts audit logs using Amazon S3 server-side encryption.

  • Format: Databricks delivers audit logs in JSON format.

  • Location: The delivery location is <bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json. New JSON files are delivered every few minutes, potentially overwriting existing files. The delivery path is defined as part of the configuration. Account-level audit events that are not associated with any single workspace are delivered to the workspaceId=0 partition, if you configured audit logs delivery for the entire account.

    • Databricks can overwrite the delivered log files in your bucket at any time. If a file is overwritten, the existing content remains, but there might be additional lines for more auditable events.

    • Overwriting ensures exactly-once semantics without requiring read or delete access to your account.

Use the log delivery APIs

The log delivery APIs have the following additional features:

Log delivery configuration status can be found in the API response’s log_delivery_status object. With log_delivery_status, you can check the status (success or failure) and the last time of an attempt or successful delivery.

Audit log delivery limitations

There is a limit on the number of log delivery configurations available per account (each limit applies separately to each log type including billable usage and audit logs). You can create a maximum of two enabled account-level delivery configurations (configurations without a workspace filter) per type. Additionally, you can create and enable two workspace level delivery configurations per workspace for each log type, meaning the same workspace ID can occur in the workspace filter for no more than two delivery configurations per log type.

You cannot delete a log delivery configuration, but you can disable it. You can re-enable a disabled configuration, but the request fails if it violates the limits previously described.

Enable verbose audit logs

In addition to the default events, you can configure a workspace to generate additional events by enabling verbose audit logs.

To enable or disable verbose audit logs, do the following:

  1. As a workspace admin, go to the Databricks admin settings page.

  2. Click Workspace settings.

  3. Next to Verbose Audit Logs, enable or disable the feature.

When you enable or disable verbose logging, an auditable event is emitted in the category workspace with action workspaceConfKeys. The workspaceConfKeys request parameter is enableVerboseAuditLogs. The request parameter workspaceConfValues is true (feature enabled) or false (feature disabled).

Additional verbose audit logs

When you configure verbose audit logs, your logs include the following additional events:

Service

Action name

Description

Request parameters

notebook

runCommand

Emitted after an interactive user runs a command in a notebook. A command corresponds to a cell in a notebook.

  • notebookId

  • executionTime

  • status

  • commandId

  • commandText

databrickssql

commandSubmit

Runs when a command is submitted to Databricks SQL.

  • commandText

  • warehouseId

  • commandId

databrickssql

commandFinish

Runs when a command completes or a command is cancelled.

  • warehouseId

  • commandId

Check the response field for additional information related to the command result:

  • statusCode - The HTTP response code. This will be error 400 if it is a general error.

  • errorMessage - Error message.

    Note

    In some cases for certain long-running commands, the errorMessage field might not be populated on failure.

  • result: This field is empty.