Configure audit log delivery
This article explains how to configure low-latency delivery of audit logs in JSON file format to an Amazon S3 storage bucket.
When your audit logs gets delivered to an S3 storage bucket, you can make the data available for usage analysis. Databricks delivers a separate JSON file for each workspace in your account and a separate file for account-level events. For more information on the file schema and audit events, see Audit log reference.
You can optionally deliver logs to an AWS account other than the account used for the IAM role that you create for log delivery. This allows flexibility, for example setting up workspaces from multiple AWS accounts to deliver to the same S3 bucket. This option requires that you configure an S3 bucket policy that references a cross-account IAM role. Instructions and a policy template are provided for you in Step 3: cross-account support.
In addition to delivery of logs for running workspaces, logs are delivered for cancelled workspaces to ensure that logs are properly delivered that represent the final day of the workspace.
Requirements
To configure audit log delivery, you must:
Be an account admin.
Authenticate to the APIs so you can set up delivery with the Account API. See How to authenticate to the Account API.
High-level flow
This section describes the high-level flow of audit log delivery.
Step 1: Configure storage: In AWS, create a new S3 bucket. Using Databricks APIs, call the Account API to create a storage configuration object that uses the bucket name.
Step 2: Configure credentials: In AWS, create the appropriate AWS IAM role. Using Databricks APIs, call the Account API to create a credentials configuration object that uses the IAM role’s ARN.
(Optional) Step 3: cross-account support: To deliver logs to an AWS account other than the account of the IAM role that you create for log delivery, add an S3 bucket policy. This policy references IDs for the cross-account IAM role that you created in the previous step.
Step 4: Call the log delivery API: Call the Account API to create a log delivery configuration that uses the credential and storage configuration objects from previous steps.
After you complete these steps, you can access the JSON files. The delivery location is in the following format:
<bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json
Note
If you configure audit log delivery for the entire account, account-level audit events that are not associated with any single workspace are delivered to the workspaceId=0
partition.
Considerations based on your number of workspaces
Your delivery configuration might vary depending on how many workspaces you have and where they are located:
If you have one workspace in your Databricks account: Follow the instructions as outlined in the high-level flow, creating a single configuration object for your workspace.
If you have multiple workspaces in the same Databricks account: Do one of the following:
Share the same configuration (log delivery S3 bucket and IAM role) for all workspaces in the account. This is the only configuration option that also delivers account-level audit logs. It is the default option.
Use separate configurations for each workspace in the account.
Use separate configurations for different groups of workspaces, each sharing a configuration.
If you have multiple workspaces, each associated with a separate Databricks account: Create unique storage and credential configuration objects for each account. You can reuse an S3 bucket or IAM role between these configuration objects.
Note
You can configure log delivery with the Account API even if the workspace wasn’t created using the Account API.
How to authenticate to the Account API
To authenticate to the Account API, you can use Databricks OAuth for service principals or Databricks OAuth for users. Databricks strongly recommends that you use Databricks OAuth for users or service principals. A service principal is an identity that you create in Databricks for use with automated tools, jobs, and applications. See Authenticate access to Databricks with a service principal using OAuth (OAuth M2M).
Use the following examples to authenticate to a Databricks account. You can use OAuth for service principals or OAuth for users. For background, see:
For OAuth for service principals, see Authenticate access to Databricks with a service principal using OAuth (OAuth M2M).
For OAuth for users, see Authenticate access to Databricks with a user account using OAuth (OAuth U2M).
Note
Basic authentication using a Databricks username and password reached end of life on July 10, 2024. See End of life for Databricks-managed passwords.
For authentication examples, choose from the following:
Install Databricks CLI version 0.205 or above. See Install or update the Databricks CLI.
Complete the steps to configure OAuth M2M authentication for service principals in the account. See Authenticate access to Databricks with a service principal using OAuth (OAuth M2M).
Identify or manually create a Databricks configuration profile in your
.databrickscfg
file, with the profile’s fields set correctly for the relatedhost
,account_id
, andclient_id
andclient_secret
mapping to the service principal. See OAuth machine-to-machine (M2M) authentication.Run your target Databricks CLI command, where
<profile-name>
represents the name of the configuration profile in your.databrickscfg
file:databricks account <command-name> <subcommand-name> -p <profile-name>
For example, to list all users in the account:
databricks account users list -p MY-AWS-ACCOUNT
For a list of available account commands, run the command
databricks account -h
.For a list of available subcommands for an account command, run the command
databricks account <command-name> -h
.
Install Databricks CLI version 0.205 or above. See Install or update the Databricks CLI.
Complete the steps to configure OAuth U2M authentication for users in the account. See Authenticate access to Databricks with a user account using OAuth (OAuth U2M).
Start the user authentication process by running the following Databricks CLI command:
databricks auth login --host <account-console-url> --account-id <account-id>
For example:
databricks auth login --host https://accounts.cloud.databricks.com --account-id 00000000-0000-0000-0000-000000000000
Note
If you have an existing Databricks configuration profile with the
host
andaccount_id
fields already set, you can substitute--host <account-console-url> --account-id <account-id>
with--profile <profile-name>
.Follow the on-screen instructions to have the Databricks CLI automatically create the related Databricks configuration profile in your
.databrickscfg
file.Continue following the on-screen instructions to sign in to your Databricks account through your web browser.
Run your target Databricks CLI command, where
<profile-name>
represents the name of the configuration profile in your.databrickscfg
file:databricks account <command-name> <subcommand-name> -p <profile-name>
For example, to list all users in the account:
databricks account users list -p ACCOUNT-00000000-0000-0000-0000-000000000000
For a list of available account commands, run the command
databricks account -h
.For a list of available subcommands for an account command, run the command
databricks account <command-name> -h
.
Audit delivery details
After logging is enabled for your account, Databricks automatically sends audit logs in human-readable format to your delivery location on a periodic basis.
Latency: After initial setup or other configuration changes, expect some delay before your changes take effect. For initial setup of audit log delivery, it takes up to one hour for log delivery to begin. After log delivery begins, auditable events are typically logged within 15 minutes. Additional configuration changes typically take an hour to take effect.
Encryption: Databricks encrypts audit logs using Amazon S3 server-side encryption.
Format: Databricks delivers audit logs in JSON format.
Location: The delivery location is
<bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json
. New JSON files are delivered every few minutes, potentially overwriting existing files. The delivery path is defined as part of the configuration. Account-level audit events that are not associated with any single workspace are delivered to theworkspaceId=0
partition, if you configured audit logs delivery for the entire account.Databricks can overwrite the delivered log files in your bucket at any time. If a file is overwritten, the existing content remains, but there might be additional lines for more auditable events.
Overwriting ensures exactly-once semantics without requiring read or delete access to your account.
Use the log delivery APIs
The log delivery APIs have the following additional features:
Log delivery configuration status can be found in the API response’s log_delivery_status
object. With log_delivery_status
, you can check the status (success or failure) and the last time of an attempt or successful delivery.
Audit log delivery limitations
There is a limit on the number of log delivery configurations available per account (each limit applies separately to each log type including billable usage and audit logs). You can create a maximum of two enabled account-level delivery configurations (configurations without a workspace filter) per type. Additionally, you can create and enable two workspace level delivery configurations per workspace for each log type, meaning the same workspace ID can occur in the workspace filter for no more than two delivery configurations per log type.
You cannot delete a log delivery configuration, but you can disable it. You can re-enable a disabled configuration, but the request fails if it violates the limits previously described.