Configure audit logging

Note

This feature is available on the Premium plan and above .

Databricks provides access to audit logs of activities performed by Databricks users, allowing your enterprise to monitor detailed Databricks usage patterns.

There are two types of logs:

  • Workspace-level audit logs with workspace-level events.

  • Account-level audit logs with account-level events.

For a list of each of these types of events and the associated services, see Audit events.

Configure verbose audit logs

In addition to the default events, you can configure a workspace to generate additional events by enabling verbose audit logs.

To enable verbose audit logs, your account and workspace must be on the E2 version of the platform. To confirm the version of the platform you are using, contact your Databricks representative.

Enable or disable verbose audit logs

  1. As an admin, go to the Databricks admin console.

  2. Click Workspace settings.

  3. Next to Verbose Audit Logs, enable or disable the feature.

When you enable or disable verbose logging, an auditable event is emitted in the category workspace with action workspaceConfKeys. The workspaceConfKeys request parameter is enableVerboseAuditLogs. The request parameter workspaceConfValues is true (feature enabled) or false (feature disabled).

Additional verbose notebook action

Additional verbose action in audit log category notebook:

Action

Description

Request Parameters

runCommand

Emitted after Databricks runs a command in a notebook. A command corresponds to a cell in a notebook.

[“notebookId”, “executionTime”, “status”, “commandId”, “commandText”]

Additional verbose Databricks SQL actions

Additional actions in audit log category databrickssql:

Action

Description

Request Parameters

commandSubmit

Runs when a command is submitted to Databricks SQL.

[“commandText”, “warehouseId”, “commandId”]

commandFinish

Runs when a command completes or a command is cancelled.

[“warehouseId”, “commandId”]

Check the response field for additional information related to the command result:

  • statusCode - The HTTP response code. This will be error 400 if it is a general error.

    • errorMessage - Error message. In some cases for certain long-running commands, the errorMessage field may not populate on failure.

    • result: This field is empty.

Configure audit log delivery

As a Databricks account admin, you can configure low-latency delivery of audit logs in JSON file format to an AWS S3 storage bucket, where you can make the data available for usage analysis. Databricks delivers a separate JSON file for each workspace in your account and a separate file for account-level events.

After initial setup or other log delivery configuration changes, expect a delay of up to one hour until changes take effect. After logging delivery begins, auditable events are typically logged within 15 minutes. For the file naming, delivery rules, and schema, see Audit delivery details and format.

The API to configure low-latency delivery of audit logs is Account API 2.0, which is the same API used to configure billable usage log delivery.

You can optionally deliver logs to an AWS account other than the account used for the IAM role that you create for log delivery. This allows flexibility, for example setting up workspaces from multiple AWS accounts to deliver to the same S3 bucket. This option requires that you configure an S3 bucket policy that references a cross-account IAM role. Instructions and a policy template are provided in this article.

Access to the logs depends on how you set up the S3 bucket. Databricks delivers logs to your S3 bucket with AWS’s built-in BucketOwnerFullControl Canned ACL so that account owners and designees can download the logs directly. To support bucket ownership for newly-created objects, you must set your bucket’s S3 Object Ownership setting to the value Bucket owner preferred.

Important

If instead you set your bucket’s S3 Object Ownership setting to Object writer, new objects such as your logs remain owned by the uploading account, which is by default the IAM role you created and specified to access your bucket. This can make it difficult to access the logs, because you cannot access them from the AWS console or automation tools that you authenticated with as the bucket owner.

Databricks recommends that you review Security Best Practices for S3 for guidance around protecting the data in your bucket from unwanted access.

Configuration options

To configure audit log delivery, you have the following options.

  • If you have one workspace in your Databricks account, follow the instructions in the sections that follow, creating a single configuration object with a common configuration for your workspace.

  • If you have multiple workspaces in the same Databricks account, you can do any of the following:

    • Share the same configuration (log delivery S3 bucket and IAM role) for all workspaces in the account. This is the only configuration option that also delivers account-level audit logs. It is the default option.

    • Use separate configurations for each workspace in the account.

    • Use separate configurations for different groups of workspaces, each sharing a configuration.

  • If you have multiple workspaces, each associated with a separate Databricks account, you must create unique storage and credential configuration objects for each account, but you can reuse an S3 bucket or IAM role between these configuration objects.

Note

Even though you use the Account API to configure log delivery, you can configure log delivery for any workspace, including workspaces that were not created using the Account API.

High-level flow

The high-level flow of audit log delivery:

  1. Configure storage: In AWS, create a new AWS S3 bucket. Using Databricks APIs, call the Account API to create a storage configuration object that uses the bucket name.

    Note

    To deliver logs to an AWS account other than the account used for the IAM role that you create for log delivery, you need to add an S3 bucket policy. You do not add the policy in this step.

  2. Configure credentials: In AWS, create the appropriate AWS IAM role. Using Databricks APIs, call the Account API to create a credentials configuration object that uses the IAM role’s ARN. The role policy can specify a path prefix for log delivery within your S3 bucket. You can choose to define an IAM role to include multiple path prefixes if you want log delivery configurations for different workspaces that share the S3 bucket but use different path prefixes.

  3. Optional cross-account support To deliver logs to an AWS account other than the account of the IAM role that you create for log delivery, add an S3 bucket policy. This policy references IDs for the cross-account IAM role that you created in the previous step.

  4. Call the log delivery API: Call the Account API to create a log delivery configuration that uses the credential and storage configuration objects from previous steps. This step lets you specify if you want to associate the log delivery configuration for all workspaces in your account (current and future workspaces) or for a specific set of workspaces. For a list of account-level events, see Audit events.

After you complete these steps, you can access the JSON files. The delivery location is:

bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json

If you configure audit log delivery for the entire account, account-level audit events that are not associated with any single workspace are delivered to the workspaceId=0 partition.

New JSON files are delivered every few minutes, potentially overwriting existing files for each workspace. When you initially set up audit log delivery, it can take up to one hour for log delivery to begin. After audit log delivery begins, auditable events are typically logged within 15 minutes. Additional configuration changes typically take an hour to take effect.

For more information about accessing these files and analyzing them using Databricks, see Analyze audit logs.

Important

There is a limit on the number of log delivery configurations available per account (each limit applies separately to each log type including billable usage and audit logs). You can create a maximum of two enabled account-level delivery configurations (configurations without a workspace filter) per type. Additionally, you can create and enable two workspace level delivery configurations per workspace for each log type, meaning the same workspace ID can occur in the workspace filter for no more than two delivery configurations per log type. You cannot delete a log delivery configuration, but you can disable it. You can re-enable a disabled configuration, but the request fails if it violates the limits previously described.

Requirements

  • Account admin email address and password to authenticate with the APIs. The email address and password are both case sensitive.

  • Account ID. Get your account ID from the account console.

How to authenticate to the APIs

The APIs described in this article are published on the accounts.cloud.databricks.com base endpoint for all AWS regional deployments.

Use the following base URL for API requests: https://accounts.cloud.databricks.com/api/2.0/

This REST API requires HTTP basic authentication, which involves setting the HTTP header Authorization. In this article, username refers to your account admin email address. The email address is case sensitive. There are several ways to provide your credentials to tools such as curl.

  • Pass your username and account password separately in the headers of each request in <username>:<password> syntax.

    For example:

    curl -X GET -u `<username>:<password>` -H "Content-Type: application/json" \
     'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/<endpoint>'
    
  • Apply base64 encoding to your <username>:<password> string and provide it directly in the HTTP header:

    curl -X GET -H "Content-Type: application/json" \
      -H 'Authorization: Basic <base64-username-pw>'
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/<endpoint>'
    
  • Create a .netrc file with machine, login, and password properties:

    machine accounts.cloud.databricks.com
    login <username>
    password <password>
    

    To invoke the .netrc file, use -n in your curl command:

    curl -n -X GET 'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/workspaces'
    

    This article’s examples use this authentication style.

For the complete API reference, see Account API 2.0.

Step 1: Configure storage

Databricks delivers the log to an S3 bucket in your account. You can configure multiple workspaces to use a single S3 bucket, or you can define different workspaces (or groups of workspaces) to use different buckets.

This procedure describes how to set up a single configuration object with a common configuration for one or more workspaces in the account. To use different storage locations for different workspaces, repeat the procedures in this article for each workspace or group of workspaces.

  1. Create the S3 bucket, following the instructions in Configure AWS storage.

    Important

    To deliver logs to an AWS account other than the one used for your Databricks workspace, you must add an S3 bucket policy. You do not add the bucket policy in this step. See Step 3: Optional cross-account support.

  2. Create a Databricks storage configuration record that represents your new S3 bucket. Specify your S3 bucket by calling the create new storage configuration API (POST /accounts/<account-id>/storage-configurations).

    Pass the following:

    • storage_configuration_name: New unique storage configuration name.

    • root_bucket_info: A JSON object that contains a bucket_name field that contains your S3 bucket name.

    Copy the storage_configuration_id value returned in the response body. You will use it to create the log delivery configuration in a later step.

    For example:

    curl -X POST -n \
        'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/storage-configurations' \
      -d '{
        "storage_configuration_name": "databricks-workspace-storageconf-v1",
        "root_bucket_info": {
          "bucket_name": "my-company-example-bucket"
        }
      }'
    

    Response:

    {
      "storage_configuration_id": "<databricks-storage-config-id>",
      "account_id": "<databricks-account-id>",
      "root_bucket_info": {
        "bucket_name": "my-company-example-bucket"
      },
      "storage_configuration_name": "databricks-workspace-storageconf-v1",
      "creation_time": 1579754875555
    }
    

Step 2: Configure credentials

This procedure describes how to set up a single configuration object with a common configuration for one or more workspaces in the account. To use different credentials for different workspaces, repeat the procedures in this article for each workspace or group of workspaces.

Note

To use different S3 bucket names, you need to create separate IAM roles.

  1. Log into your AWS Console as a user with administrator privileges and go to the IAM service.

  2. Click the Roles tab in the sidebar.

  3. Click Create role.

    1. In Select type of trusted entity, click AWS service.

    2. In Common Use Cases, click EC2.

    3. Click the Next: Permissions button.

    4. Click the Next: Tags button.

    5. Click the Next: Review button.

    6. In the Role name field, enter a role name.

      Role name
    7. Click Create role. The list of roles displays.

  4. In the list of roles, click the role you created.

  5. Add an inline policy.

    1. On the Permissions tab, click Add inline policy.

      Inline policy
    2. In the policy editor, click the JSON tab.

      JSON editor
    3. Copy this access policy and modify it. Replace the following values in the policy with your own configuration values:

      • <s3-bucket-name>: The bucket name of your AWS S3 bucket.

      • <s3-bucket-path-prefix>: (Optional) The path to the delivery location in the S3 bucket. If unspecified, the logs are delivered to the root of the bucket. This path must match the delivery_path_prefix argument when you call the log delivery API.

      {
        "Version":"2012-10-17",
        "Statement":[
          {
            "Effect":"Allow",
            "Action":[
              "s3:GetBucketLocation"
            ],
            "Resource":[
              "arn:aws:s3:::<s3-bucket-name>"
            ]
          },
          {
            "Effect":"Allow",
            "Action":[
              "s3:PutObject",
              "s3:GetObject",
              "s3:DeleteObject",
              "s3:PutObjectAcl",
              "s3:AbortMultipartUpload"
            ],
            "Resource":[
              "arn:aws:s3:::<s3-bucket-name>/<s3-bucket-path-prefix>/",
              "arn:aws:s3:::<s3-bucket-name>/<s3-bucket-path-prefix>/*"
            ]
          },
          {
            "Effect":"Allow",
            "Action":[
              "s3:ListBucket",
              "s3:ListMultipartUploadParts",
              "s3:ListBucketMultipartUploads"
            ],
            "Resource":"arn:aws:s3:::<s3-bucket-name>",
            "Condition":{
              "StringLike":{
                "s3:prefix":[
                  "<s3-bucket-path-prefix>",
                  "<s3-bucket-path-prefix>/*"
                ]
              }
            }
          }
        ]
      }
      

      You can customize the policy usage of the path prefix:

      • If you do not want to use the bucket path prefix, remove <s3-bucket-path-prefix>/ (including the final slash) from the policy each time it appears.

      • If you want log delivery configurations for different workspaces that share the S3 bucket but use different path prefixes, you can define an IAM role to include multiple path prefixes. There are two separate parts of the policy that reference <s3-bucket-path-prefix>. In each case, duplicate the two adjacent lines that reference the path prefix. Repeat each pair of lines for every new path prefix, for example:

      {
        "Resource":[
          "arn:aws:s3:::<mybucketname>/field-team/",
          "arn:aws:s3:::<mybucketname>/field-team/*",
          "arn:aws:s3:::<mybucketname>/finance-team/",
          "arn:aws:s3:::<mybucketname>/finance-team/*"
        ]
      }
      
    4. Click Review policy.

    5. In the Name field, enter a policy name.

    6. Click Create policy.

    7. If you use service control policies to deny certain actions at the AWS account level, ensure that sts:AssumeRole is whitelisted so Databricks can assume the cross-account role.

  6. On the role summary page, click the Trust Relationships tab.

  7. Paste this access policy into the editor and replace the following values in the policy with your own configuration values:

    <databricks-account-id> — Your Databricks account ID.

    {
      "Version":"2012-10-17",
      "Statement":[
        {
          "Effect":"Allow",
          "Principal":{
            "AWS":"arn:aws:iam::414351767826:role/SaasUsageDeliveryRole-prod-IAMRole-3PLHICCRR1TK"
          },
          "Action":"sts:AssumeRole",
          "Condition":{
            "StringEquals":{
              "sts:ExternalId":[
                "<databricks-account-id>"
              ]
            }
          }
        }
      ]
    }
    
  8. In the role summary, copy the Role ARN and save it for a later step.

    Role ARN
  9. Create a Databricks credentials configuration ID for your AWS role. Call the Create credential configuration API (POST /accounts/<account-id>/credentials). This request establishes cross-account trust and returns a reference ID to use when you create a new workspace.

    Replace <account-id> with your Databricks account ID. In the request body:

    • Set credentials_name to a name that is unique within your account.

    • Set aws_credentials to an object that contains an sts_role property. That object must specify the role_arn for the role you’ve created.

    The response body will include a credentials_id field, which is the Databricks credentials configuration ID that you need to create the new workspace. Copy this field so you can use it to create the log delivery configuration in a later step.

    For example:

     curl -X POST -n \
       'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/credentials' \
       -d '{
       "credentials_name": "databricks-credentials-v1",
       "aws_credentials": {
         "sts_role": {
           "role_arn": "arn:aws:iam::<aws-account-id>:role/my-company-example-role"
         }
       }
     }'
    

    Example response:

     {
       "credentials_id": "<databricks-credentials-id>",
       "account_id": "<databricks-account-id>",
       "aws_credentials": {
         "sts_role": {
           "role_arn": "arn:aws:iam::<aws-account-id>:role/my-company-example-role",
           "external_id": "<databricks-account-id>"
         }
       },
       "credentials_name": "databricks-credentials-v1",
       "creation_time": 1579753556257
     }
    

    Copy the credentials_id field from the response for later use.

Step 3: Optional cross-account support

If your S3 bucket is in the same AWS account as your IAM role used for log delivery, skip this step.

To deliver logs to an AWS account other than the one used for your Databricks workspace, you must add an S3 bucket policy, provided in this step. This policy references IDs for the cross-account IAM role that you created in the previous step.

  1. In the AWS Console, go to the S3 service.

  2. Click the bucket name.

  3. Click the Permissions tab.

  4. Click the Bucket Policy button.

    Bucket policy button
  5. Copy and modify this bucket policy.

    Replace <s3-bucket-name> with the S3 bucket name. Replace <customer-iam-role-id> with the role ID of your newly-created IAM role. Replace <s3-bucket-path-prefix> with the bucket path prefix you want. See the notes after the policy sample for information about customizing the path prefix.

     {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": ["arn:aws:iam::<customer-iam-role-id>"]
               },
               "Action": "s3:GetBucketLocation",
               "Resource": "arn:aws:s3:::<s3-bucket-name>"
           },
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::<customer-iam-role-id>"
               },
               "Action": [
                   "s3:PutObject",
                   "s3:GetObject",
                   "s3:DeleteObject",
                   "s3:PutObjectAcl",
                   "s3:AbortMultipartUpload",
                   "s3:ListMultipartUploadParts"
               ],
               "Resource": [
                   "arn:aws:s3:::<s3-bucket-name>/<s3-bucket-path-prefix>/",
                   "arn:aws:s3:::<s3-bucket-name>/<s3-bucket-path-prefix>/*"
               ]
           },
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::<customer-iam-role-id>"
               },
               "Action": "s3:ListBucket",
               "Resource": "arn:aws:s3:::<s3-bucket-name>",
               "Condition": {
                   "StringLike": {
                       "s3:prefix": [
                           "<s3-bucket-path-prefix>",
                           "<s3-bucket-path-prefix>/*"
                       ]
                   }
               }
           }
       ]
     }
    

    You can customize the policy use of the path prefix:

    • If you do not want to use the bucket path prefix, remove <s3-bucket-path-prefix>/ (including the final slash) from the policy each time it appears.

    • If you want log delivery configurations for multiple workspaces that share the same S3 bucket but use different path prefixes, you can define an IAM role to include multiple path prefixes. Two parts of the policy reference <s3-bucket-path-prefix>. In each place, duplicate the two adjacent lines that reference the path prefix. Repeat each pair of lines for each new path prefix. For example:

      {
        "Resource":[
          "arn:aws:s3:::<mybucketname>/field-team/",
          "arn:aws:s3:::<mybucketname>/field-team/*",
          "arn:aws:s3:::<mybucketname>/finance-team/",
          "arn:aws:s3:::<mybucketname>/finance-team/*"
        ]
      }
      

Step 4: Call the log delivery API

To configure log delivery, call the Log delivery configuration API (POST /accounts/<account-id>/log-delivery).

You need the following values that you copied in the previous steps:

  • credentials_id: Your Databricks credential configuration ID, which represents your cross-account role credentials.

  • storage_configuration_id: Your Databricks storage configuration ID, which represents your root S3 bucket.

Also set the following fields:

  • log_type: Always set to AUDIT_LOGS.

  • output_format: Always set to JSON. For the schema, see Audit log schema.

  • delivery_path_prefix: (Optional) Set to the path prefix. This must match the path prefix that you used in your role policy. The delivery path is <bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json. If you configure audit log delivery for the entire account, account-level audit events that are not associated with any single workspace are delivered to the workspaceId=0 partition.

  • workspace_ids_filter: (Optional) Set to an array of workspace IDs whose logs you want to deliver. By default, the workspace filter field is empty and log delivery applies at the account level, delivering workspace-level logs for all workspaces in your account, plus account-level logs. You can optionally set this field to an array of workspace IDs (each one is an int64) to which log delivery should apply, in which case only workspace-level logs relating to the specified workspaces are delivered. For a list of account-level events, see Audit events.

    If you plan to use different log delivery configurations for different workspaces, set this field explicitly. Be aware that delivery configurations that mention specific workspaces won’t apply to new workspaces created in the future, and delivery won’t include account-level logs.

    For some types of Databricks deployments there is only one workspace per account ID, so this field is unnecessary.

Important

There is a limit on the number of log delivery configurations available per account (each limit applies separately to each log type, including billable usage and audit logs). You can create a maximum of two enabled account-level delivery configurations (configurations without a workspace filter) per type. Additionally, you can create two enabled workspace level delivery configurations per workspace for each log type, meaning the same workspace ID can occur in the workspace filter for no more than two delivery configurations per log type. You cannot delete a log delivery configuration, but you can disable it. You can re-enable a disabled configuration, but the request fails if it violates the limits previously described.

For example:

curl -X POST -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/log-delivery' \
  -d '{
  "log_delivery_configuration": {
    "log_type": "AUDIT_LOGS",
    "config_name": "audit log config",
    "output_format": "JSON",
    "credentials_id": "<databricks-credentials-id>",
    "storage_configuration_id": "<databricks-storage-config-id>",
    "delivery_path_prefix": "auditlogs-data",
    "workspace_ids_filter": [
        6383650456894062,
        4102272838062927
    ]
    }
}'

Example response:

{
    "log_delivery_configuration": {
        "config_id": "<config-id>",
        "config_name": "audit log config",
        "log_type": "AUDIT_LOGS",
        "output_format": "JSON",
        "account_id": "<account-id>",
        "credentials_id": "<databricks-credentials-id>",
        "storage_configuration_id": "<databricks-storage-config-id>",
        "workspace_ids_filter": [
            6383650456894062,
            4102272838062927
        ],
        "delivery_path_prefix": "auditlogs-data",
        "status": "ENABLED",
        "creation_time": 1591638409000,
        "update_time": 1593108904000,
        "log_delivery_status": {
          "status": "CREATED",
          "message": "Log Delivery Configuration is successfully created. Status will be updated after the first delivery attempt."
        }
    }
}

Additional features of the log delivery APIs

The log delivery APIs have additional features. See the API reference documentation for details.

Additional operations include:

Log delivery configuration status can be found in the API response’s log_delivery_status object. With log_delivery_status, you can check the status (success or failure) and the last time of an attempt or successful delivery.

Important

There is a limit on the number of log delivery configurations available per account (each limit applies separately to each log type including billable usage and audit logs). You can create a maximum of two enabled account-level delivery configurations (configurations without a workspace filter) per type. Additionally, you can create two enabled workspace level delivery configurations per workspace for each log type, meaning the same workspace ID can occur in the workspace filter for no more than two delivery configurations per log type. You cannot delete a log delivery configuration, but you can disable it. You can re-enable a disabled configuration, but the request fails if it violates the limits previously described.

Audit delivery details and format

Once logging is enabled for your account, Databricks automatically starts sending audit logs in human-readable format to your delivery location on a periodic basis.

  • Latency: After initial setup or other configuration changes, expect some delay before your changes take effect. For initial setup of audit log delivery, it takes up to one hour for log delivery to begin. After log delivery begins, auditable events are typically logged within 15 minutes. Additional configuration changes typically take an hour to take effect.

  • Encryption: Databricks encrypts audit logs using Amazon S3 server-side encryption.

  • Format: Databricks delivers audit logs in JSON format.

  • Location: The delivery location is <bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json. New JSON files are delivered every few minutes, potentially overwriting existing files. The delivery path is defined as part of the configuration. Account-level audit events that are not associated with any single workspace will be delivered to the workspaceId=0 partition, if you configured audit logs delivery for the entire account.

    • Databricks can overwrite the delivered log files in your bucket at any time. If a file is overwritten, the existing content remains, but there may be additional lines for more auditable events.

    • Overwriting ensures exactly-once semantics without requiring read or delete access to your account.

Audit log schema

The schema of audit log records is as follows.

  • version: The schema version of the audit log format.

  • timestamp: UTC timestamp of the action.

  • workspaceId: ID of the workspace this event relates to. May be set to “0” for account-level events that apply to no workspace.

  • sourceIPAddress: The IP address of the source request.

  • userAgent: The browser or API client used to make the request.

  • sessionId: Session ID of the action.

  • userIdentity: Information about the user that makes the requests.

    • email: User email address.

  • serviceName: The service that logged the request.

  • actionName: The action, such as login, logout, read, write, and so on.

  • requestId: Unique request ID.

  • requestParams: Parameter key-value pairs used in the audited event.

  • response: Response to the request.

    • errorMessage: The error message if there was an error.

    • result: The result of the request.

    • statusCode: HTTP status code that indicates the request succeeds or not.

  • auditLevel: Specifies if this is a workspace-level event (WORKSPACE_LEVEL) or account-level event (ACCOUNT_LEVEL).

  • accountId: Account ID of this Databricks account.

Audit events

The serviceName and actionName properties identify an audit event in an audit log record. The naming convention follows the Databricks REST API reference.

Workspace-level audit logs are available for these services:

  • accounts

  • clusters

  • clusterPolicies

  • databrickssql

  • dbfs

  • genie

  • gitCredentials

  • globalInitScripts

  • groups

  • iamRole

  • instancePools

  • jobs

  • mlflowExperiment

  • notebook

  • repos

  • secrets

  • serverlessRealTimeInference, pertains to Model Serving.

  • sqlPermissions, which has all the audit logs for table access when table ACLs are enabled.

  • ssh

  • webTerminal

  • workspace

Account-level audit logs are available for these services:

  • accountBillableUsage: Access to billable usage for the account.

  • accountsManager: Actions performed in the accounts console.

  • logDelivery: Log delivery configuration for such as billable usage or audit logs.

  • oauth2: Actions related to OAuth SSO authentication to the account console.

  • ssoConfigBackend: Single sign-on settings for the account.

Account-level events have the workspaceId field set to a valid workspace ID if they reference workspace-related events like creating or deleting a workspace. If they are not associated with any workspace, the workspaceId field is set to 0. Account level audit logs are delivered only for account-level delivery configurations with an empty workspace filter field (which deliver audit logs for all events in the account).

Audit event considerations

  • If actions take a long time, the request and response are logged separately but the request and response pair have the same requestId.

    • With the exception of mount-related operations, Databricks audit logs do not include DBFS-related operations. We recommend that you set up server access logging in S3, which can log object-level operations associated with an IAM role. If you map IAM roles to Databricks users, your Databricks users cannot share IAM roles.

    • Automated actions, such as resizing a cluster due to autoscaling or launching a job due to scheduling, are performed by the user System-User.

Deprecated audit log events

Databricks has deprecated the following audit events:

  • createAlertDestination (now createNotificationDestination)

  • deleteAlertDestination (now deleteNotificationDestination)

  • updateAlertDestination (now updateNotificationDestination)

Request parameters

The request parameters in the field requestParams for each supported service and action are listed in the following sections, grouped by workspace-level events and account-level events.

The requestParams field is subject to truncation. If the size of its JSON representation exceeds 100 KB, values are truncated and the string ... truncated is appended to truncated entries. In rare cases where a truncated map is still larger than 100 KB, a single TRUNCATED key with an empty value is present instead.

Workspace-level audit log events

Service

Action

Request Parameters

accounts

add

[“targetUserName”, “endpoint”, “targetUserId”]

addPrincipalToGroup

[“targetGroupId”, “endpoint”, “targetUserId”, “targetGroupName”, “targetUserName”]

removePrincipalFromGroup

[“targetGroupId”, “endpoint”, “targetUserId”, “targetGroupName”, “targetUserName”]

changePassword

[“newPasswordSource”, “targetUserId”, “serviceSource”, “wasPasswordChanged”, “userId”]

createGroup

[“endpoint”, “targetGroupId”, “targetGroupName”]

delete

[“targetUserId”, “targetUserName”, “endpoint”]

garbageCollectDbToken

[“tokenExpirationTime”, “userId”]

generateDbToken

[“userId”, “tokenExpirationTime”]

jwtLogin

[“user”]

login

[“user”]

logout

[“user”]

removeAdmin

[“targetUserName”, “endpoint”, “targetUserId”]

removeGroup

[“targetGroupId”, “targetGroupName”, “endpoint”]

resetPassword

[“serviceSource”, “userId”, “endpoint”, “targetUserId”, “targetUserName”, “wasPasswordChanged”, “newPasswordSource”]

revokeDbToken

[“userId”]

samlLogin

[“user”]

setAdmin

[“endpoint”, “targetUserName”, “targetUserId”]

tokenLogin

[“tokenId”, “user”]

validateEmail

[“endpoint”, “targetUserName”, “targetUserId”]

clusters

changeClusterAcl

[“shardName”, “aclPermissionSet”, “targetUserId”, “resourceId”]

create

[“cluster_log_conf”, “num_workers”, “enable_elastic_disk”, “driver_node_type_id”, “start_cluster”, “docker_image”, “ssh_public_keys”, “aws_attributes”, “acl_path_prefix”, “node_type_id”, “instance_pool_id”, “spark_env_vars”, “init_scripts”, “spark_version”, “cluster_source”, “autotermination_minutes”, “cluster_name”, “autoscale”, “custom_tags”, “cluster_creator”, “enable_local_disk_encryption”, “idempotency_token”, “spark_conf”, “organization_id”, “no_driver_daemon”, “user_id”]

createResult

[“clusterName”, “clusterState”, “clusterId”, “clusterWorkers”, “clusterOwnerUserId”]

delete

[“cluster_id”]

deleteResult

[“clusterWorkers”, “clusterState”, “clusterId”, “clusterOwnerUserId”, “clusterName”]

edit

[“spark_env_vars”, “no_driver_daemon”, “enable_elastic_disk”, “aws_attributes”, “driver_node_type_id”, “custom_tags”, “cluster_name”, “spark_conf”, “ssh_public_keys”, “autotermination_minutes”, “cluster_source”, “docker_image”, “enable_local_disk_encryption”, “cluster_id”, “spark_version”, “autoscale”, “cluster_log_conf”, “instance_pool_id”, “num_workers”, “init_scripts”, “node_type_id”]

permanentDelete

[“cluster_id”]

resize

[“cluster_id”, “num_workers”, “autoscale”]

resizeResult

[“clusterWorkers”, “clusterState”, “clusterId”, “clusterOwnerUserId”, “clusterName”]

restart

[“cluster_id”]

restartResult

[“clusterId”, “clusterState”, “clusterName”, “clusterOwnerUserId”, “clusterWorkers”]

start

[“init_scripts_safe_mode”, “cluster_id”]

startResult

[“clusterName”, “clusterState”, “clusterWorkers”, “clusterOwnerUserId”, “clusterId”]

clusterPolicies

create

[“name”]

edit

[“policy_id”, “name”]

delete

[“policy_id”]

changeClusterPolicyAcl

[“shardName”, “targetUserId”, “resourceId”, “aclPermissionSet”]

dbfs (REST API)

addBlock

[“handle”, “data_length”]

create

[“path”, “bufferSize”, “overwrite”]

delete

[“recursive”, “path”]

mkdirs

[“path”]

move

[“dst”, “source_path”, “src”, “destination_path”]

put

[“path”, “overwrite”]

dbfs (operations)

mount

[“mountPoint”, “owner”]

unmount

[“mountPoint”]

genie

databricksAccess

[“duration”, “approver”, “reason”, “authType”, “user”]

gitCredentials

getGitCredential

[“id”]

listGitCredentials

[]

deleteGitCredential

[“id”]

updateGitCredential

[“id”, “git_provider”, “git_username”]

createGitCredential

[“git_provider”, “git_username”]

globalInitScripts

create

[“name”, “position”, “script-SHA256”, “enabled”]

update

[“script_id”, “name”, “position”, “script-SHA256”, “enabled”]

delete

[“script_id”]

groups

addPrincipalToGroup

[“user_name”, “parent_name”]

createGroup

[“group_name”]

getGroupMembers

[“group_name”]

removeGroup

[“group_name”]

iamRole

changeIamRoleAcl

[“targetUserId”, “shardName”, “resourceId”, “aclPermissionSet”]

instancePools

changeInstancePoolAcl

[“shardName”, “resourceId”, “targetUserId”, “aclPermissionSet”]

create

[“enable_elastic_disk”, “preloaded_spark_versions”, “idle_instance_autotermination_minutes”, “instance_pool_name”, “node_type_id”, “custom_tags”, “max_capacity”, “min_idle_instances”, “aws_attributes”]

delete

[“instance_pool_id”]

edit

[“instance_pool_name”, “idle_instance_autotermination_minutes”, “min_idle_instances”, “preloaded_spark_versions”, “max_capacity”, “enable_elastic_disk”, “node_type_id”, “instance_pool_id”, “aws_attributes”]

jobs

cancel

[“run_id”]

cancelAllRuns

[“job_id”]

changeJobAcl

[“shardName”, “aclPermissionSet”, “resourceId”, “targetUserId”]

create

[“spark_jar_task”, “email_notifications”, “notebook_task”, “spark_submit_task”, “timeout_seconds”, “libraries”, “name”, “spark_python_task”, “job_type”, “new_cluster”, “existing_cluster_id”, “max_retries”, “schedule”]

delete

[“job_id”]

deleteRun

[“run_id”]

reset

[“job_id”, “new_settings”]

resetJobAcl

[“grants”, “job_id”]

runFailed

[“jobClusterType”, “jobTriggerType”, “jobId”, “jobTaskType”, “runId”, “jobTerminalState”, “idInJob”, “orgId”]

runNow

[“notebook_params”, “job_id”, “jar_params”, “workflow_context”]

runSucceeded

[“idInJob”, “jobId”, “jobTriggerType”, “orgId”, “runId”, “jobClusterType”, “jobTaskType”, “jobTerminalState”]

setTaskValue

[“run_id”, “key”]

submitRun

[“shell_command_task”, “run_name”, “spark_python_task”, “existing_cluster_id”, “notebook_task”, “timeout_seconds”, “libraries”, “new_cluster”, “spark_jar_task”]

update

[“fields_to_remove”, “job_id”, “new_settings”]

mlflowExperiment

deleteMlflowExperiment

[“experimentId”, “path”, “experimentName”]

moveMlflowExperiment

[“newPath”, “experimentId”, “oldPath”]

restoreMlflowExperiment

[“experimentId”, “path”, “experimentName”]

mlflowModelRegistry

listModelArtifacts

[“name”, “version”, “path”, “page_token”]

getModelVersionSignedDownloadUri

[“name”, “version”, “path”]

createRegisteredModel

[“name”, “tags”]

deleteRegisteredModel

[“name”]

renameRegisteredModel

[“name”, “new_name”]

setRegisteredModelTag

[“name”, “key”, “value”]

deleteRegisteredModelTag

[“name”, “key”]

createModelVersion

[“name”, “source”, “run_id”, “tags”, “run_link”]

deleteModelVersion

[“name”, “version”]

getModelVersionDownloadUri

[“name”, “version”]

setModelVersionTag

[“name”, “version”, “key”, “value”]

deleteModelVersionTag

[“name”, “version”, “key”]

createTransitionRequest

[“name”, “version”, “stage”]

deleteTransitionRequest

[“name”, “version”, “stage”, “creator”]

approveTransitionRequest

[“name”, “version”, “stage”, “archive_existing_versions”]

rejectTransitionRequest

[“name”, “version”, “stage”]

transitionModelVersionStage

[“name”, “version”, “stage”, “archive_existing_versions”]

transitionModelVersionStageDatabricks

[“name”, “version”, “stage”, “archive_existing_versions”]

createComment

[“name”, “version”]

updateComment

[“id”]

deleteComment

[“id”]

notebook

attachNotebook

[“path”, “clusterId”, “notebookId”]

createNotebook

[“notebookId”, “path”]

deleteFolder

[“path”]

deleteNotebook

[“notebookId”, “notebookName”, “path”]

detachNotebook

[“notebookId”, “clusterId”, “path”]

downloadLargeResults

[“notebookId”, “notebookFullPath”]

downloadPreviewResults

[“notebookId”, “notebookFullPath”]

importNotebook

[“path”]

moveNotebook

[“newPath”, “oldPath”, “notebookId”]

renameNotebook

[“newName”, “oldName”, “parentPath”, “notebookId”]

restoreFolder

[“path”]

restoreNotebook

[“path”, “notebookId”, “notebookName”]

runCommand (only for verbose audit logs)

[“notebookId”, “executionTime”, “status”, “commandId”, “commandText” (see details)]

takeNotebookSnapshot

[“path”]

repos

createRepo

[“url”, “provider”, “path”]

updateRepo

[“id”, “branch”, “tag”, “git_url”, “git_provider”]

getRepo

[“id”]

listRepos

[“path_prefix”, “next_page_token”]

deleteRepo

[“id”]

pull

[“id”]

commitAndPush

[“id”, “message”, “files”, “checkSensitiveToken”]

checkoutBranch

[“id”, “branch”]

discard

[“id”, “file_paths”]

secrets

createScope

[“scope”]

deleteScope

[“scope”]

deleteSecret

[“key”, “scope”]

getSecret

[“scope”, “key”]

listAcls

[“scope”]

listSecrets

[“scope”]

putSecret

[“string_value”, “scope”, “key”]

serverlessRealTimeInference

createServingEndpoint

[“name”, “config”]

updateServingEndpoint

[“name”,”served_models”, “traffic_config”]

deleteServingEndpoint

[“name”]

databrickssql

addDashboardWidget

[“dashboardId”, “widgetId”]

cancelQueryExecution

[“queryExecutionId”]

changeWarehouseAcls

[“aclPermissionSet”, “resourceId”, “shardName”, “targetUserId”]

changePermissions

[“granteeAndPermission”, “objectId”, “objectType”]

cloneDashboard

[“dashboardId”]

commandSubmit(only for verbose audit logs)

[“orgId”, “sourceIpAddress”, “timestamp”, “userAgent”,”userIdentity”, “shardName” (see details)]

commandFinish (only for verbose audit logs)

[“orgId”, “sourceIpAddress”, “timestamp”, “userAgent”,”userIdentity”, “shardName” (see details)]

createNotificationDestination

[“notificationDestinationId”, “notificationDestinationType”]

createDashboard

[“dashboardId”]

createDataPreviewDashboard

[“dashboardId”]

createWarehouse

[“auto_resume”, “auto_stop_mins”, “channel”, “cluster_size”, “conf_pairs”, “custom_cluster_confs”, “enable_databricks_compute”, “enable_photon”, “enable_serverless_compute”, “instance_profile_arn”, “max_num_clusters”, “min_num_clusters”, “name”, “size”, “spot_instance_policy”, “tags”, “test_overrides”]

createQuery

[“queryId”]

createQueryDraft

[“queryId”]

createQuerySnippet

[“querySnippetId”]

createRefreshSchedule

[“alertId”, “dashboardId”, “refreshScheduleId”]

createSampleDashboard

[“sampleDashboardId”]

createSubscription

[“dashboardId”, “refreshScheduleId”, “subscriptionId”]

createVisualization

[“queryId”, “visualizationId”]

deleteAlert

[“alertId”]

deleteNotificationDestination

[“notificationDestinationId”]

deleteDashboard

[“dashboardId”]

deleteDashboardWidget

[“widgetId”]

deleteWarehouse

[“id”]

deleteExternalDatasource

[“dataSourceId”]

deleteQuery

[“queryId”]

deleteQueryDraft

[“queryId”]

deleteQuerySnippet

[“querySnippetId”]

deleteRefreshSchedule

[“alertId”, “dashboardId”, “refreshScheduleId”]

deleteSubscription

[“subscriptionId”]

deleteVisualization

[“visualizationId”]

downloadQueryResult

[“fileType”, “queryId”, “queryResultId”]

editWarehouse

[“auto_stop_mins”, “channel”, “cluster_size”, “confs”, “enable_photon”, “enable_serverless_compute”, “id”, “instance_profile_arn”, “max_num_clusters”, “min_num_clusters”, “name”, “spot_instance_policy”, “tags”]

executeAdhocQuery

[“dataSourceId”]

executeSavedQuery

[“queryId”]

executeWidgetQuery

[“widgetId”]

favoriteDashboard

[“dashboardId”]

favoriteQuery

[“queryId”]

forkQuery

[“originalQueryId”, “queryId”]

listQueries

[“filter_by”, “include_metrics”, “max_results”, “page_token”]

moveDashboardToTrash

[“dashboardId”]

moveQueryToTrash

[“queryId”]

muteAlert

[“alertId”]

publishBatch

[“statuses”]

publishDashboardSnapshot

[“dashboardId”, “hookId”, “subscriptionId”]

restoreDashboard

[“dashboardId”]

restoreQuery

[“queryId”]

setWarehouseConfig

[“data_access_config”, “enable_serverless_compute”, “instance_profile_arn”, “security_policy”, “serverless_agreement”, “sql_configuration_parameters”, “try_create_databricks_managed_starter_warehouse”]

snapshotDashboard

[“dashboardId”]

startWarehouse

[“id”]

stopWarehouse

[“id”]

subscribeAlert

[“alertId”, “destinationId”]

transferObjectOwnership

[“newOwner”, “objectId”, “objectType”]

unfavoriteDashboard

[“dashboardId”]

unfavoriteQuery

[“queryId”]

unmuteAlert

[“alertId”]

unsubscribeAlert

[“alertId”, “subscriberId”]

updateAlert

[“alertId”, “queryId”]

updateNotificationDestination

[“notificationDestinationId”]

updateDashboard

[“dashboardId”]

updateDashboardWidget

[“widgetId”]

updateOrganizationSetting

[“has_configured_data_access”, “has_explored_sql_warehouses”, “has_granted_permissions”]

updateQuery

[“queryId”]

updateQueryDraft

[“queryId”]

updateQuerySnippet

[“querySnippetId”]

updateRefreshSchedule

[“alertId”, “dashboardId”, “refreshScheduleId”]

updateVisualization

[“visualizationId”]

sqlPermissions

createSecurable

[“securable”]

grantPermission

[“permission”]

removeAllPermissions

[“securable”]

requestPermissions

[“requests”]

revokePermission

[“permission”]

showPermissions

[“securable”, “principal”]

ssh

login

[“containerId”, “userName”, “port”, “publicKey”, “instanceId”]

logout

[“userName”, “containerId”, “instanceId”]

webTerminal

startSession

[“socketGUID”, “clusterId”, “serverPort”, “ProxyTargetURI”]

closeSession

[“socketGUID”, “clusterId”, “serverPort”, “ProxyTargetURI”]

workspace

changeWorkspaceAcl

[“shardName”, “targetUserId”, “aclPermissionSet”, “resourceId”]

fileCreate

[“path”]

fileDelete

[“path”]

moveWorkspaceNode

[“destinationPath”, “path”]

purgeWorkspaceNodes

[“treestoreId”]

workspaceConfEdit (workspace-level setting changes)

[“workspaceConfKeys” (for example, verbose audit logs uses value enableVerboseAuditLogs), “workspaceConfValues” (for example, for verbose audit logs this is true or false)

workspaceExport

[“workspaceExportFormat”, “notebookFullPath”]

Account level audit log events

Service

Action

Request Parameters

accountBillableUsage

getAggregatedUsage

[“account_id”, “window_size”, “start_time”, “end_time”, “meter_name”, “workspace_ids_filter”]

getDetailedUsage

[“account_id”, “start_month”, “end_month”, “with_pii”]

accounts

login

[“user”]

gcpWorkspaceBrowserLogin

[“user”]

oidcBrowserLogin

[“user”]

logout

[“user”]

accountsManager

updateAccount

[“account_id”, “account”]

changeAccountOwner

[“account_id”, “first_name”, “last_name”, “email”]

consolidateAccounts

[“target_account_id”, “account_ids_to_consolidate”]

updateSubscription

[“account_id”, “subscription_id”, “subscription”]

listSubscriptions

[“account_id”]

createWorkspaceConfiguration

[“workspace”]

getWorkspaceConfiguration

[“account_id”, “workspace_id”]

listWorkspaceConfigurations

[“account_id”]

updateWorkspaceConfiguration

[“account_id”, “workspace_id”]

deleteWorkspaceConfiguration

[“account_id”, “workspace_id”]

acceptTos

[“workspace_id”]

sendTos

[“account_id”, “workspace_id”]

createCredentialsConfiguration

[“credentials”]

getCredentialsConfiguration

[“account_id”, “credentials_id”]

listCredentialsConfigurations

[“”account_id””]

deleteCredentialsConfiguration

[“account_id”, “credentials_id”]

createStorageConfiguration

[“”storage_configuration””]

getStorageConfiguration

[“account_id”, “storage_configuration_id”]

listStorageConfigurations

[“account_id”]

deleteStorageConfiguration

[“account_id”, “storage_configuration_id”]

createNetworkConfiguration

[“network”]

getNetworkConfiguration

[“account_id”, “network_id”]

listNetworkConfigurations

[“account_id”]

deleteNetworkConfiguration

[“account_id”, “network_id”]

createCustomerManagedKeyConfiguration

[“”customer_managed_key””]

getCustomerManagedKeyConfiguration

[“account_id”, “customer_managed_key_id”]

listCustomerManagedKeyConfigurations

[“account_id”]

deleteCustomerManagedKeyConfiguration

[“account_id”, “customer_managed_key_id”]

listWorkspaceEncryptionKeyRecords

[“account_id”, “workspace_id”]

listWorkspaceEncryptionKeyRecordsForAccount

[“account_id”]

createVpcEndpoint

[“vpc_endpoint”]

getVpcEndpoint

[“account_id”, “vpc_endpoint_id”]

listVpcEndpoints

[“account_id”]

deleteVpcEndpoint

[“account_id”, “vpc_endpoint_id”]

createPrivateAccessSettings

[“private_access_settings”]

getPrivateAccessSettings

[“account_id”, “private_access_settings_id”]

listPrivateAccessSettings

[“account_id”]

deletePrivateAccessSettings

[“account_id”, “private_access_settings_id”]

logDelivery

createLogDeliveryConfiguration

[“account_id”, “config_id”]

updateLogDeliveryConfiguration

[“config_id”, “account_id”, “status”]

getLogDeliveryConfiguration

[“log_delivery_configuration”]

listLogDeliveryConfigurations

[“account_id”, “storage_configuration_id”, “credentials_id”, “status”]

ssoConfigBackend

create

[“account_id”, “sso_type”, “config”]

update

[“account_id”, “sso_type”, “config”]

get

[“account_id”, “sso_type”]

oauth2 (Public Preview)

enrollOAuth

[“enable_all_published_apps”]

createCustomAppIntegration

[“redirect_url”, “name”, “token_access_policy”, “confidential”]

deleteCustomAppIntegration

[“integration_id”]

updateCustomAppIntegration

[“redirect_url”, “name”, “token_access_policy”]

createPublishedAppIntegration

[“app_id”]

deletePublishedAppIntegration

[“integration_id”]

updatePublishedAppIntegration

[“token_access_policy”]

Analyze audit logs

You can analyze audit logs using Databricks. The following example uses logs to report on Databricks access and Apache Spark versions.

Load audit logs as a DataFrame and register the DataFrame as a temp table. See Working with data in Amazon S3 for a detailed guide.

val df = spark.read.format("json").load("s3a://bucketName/path/to/auditLogs")
df.createOrReplaceTempView("audit_logs")

List the users who accessed Databricks and from where.

%sql
SELECT DISTINCT userIdentity.email, sourceIPAddress
FROM audit_logs
WHERE serviceName = "accounts" AND actionName LIKE "%login%"

Check the Apache Spark versions used.

%sql
SELECT requestParams.spark_version, COUNT(*)
FROM audit_logs
WHERE serviceName = "clusters" AND actionName = "create"
GROUP BY requestParams.spark_version

Check table data access.

%sql
SELECT *
FROM audit_logs
WHERE serviceName = "sqlPermissions" AND actionName = "requestPermissions"