Configure audit logging

Note

This feature is available on the Premium plan and above.

Databricks provides access to audit logs of activities performed by Databricks users, allowing your enterprise to monitor detailed Databricks usage patterns.

There are two types of logs:

  • Workspace-level audit logs with workspace-level events.
  • Account-level audit logs with account-level events.

For a list of each of these types of events and the associated services, see Audit events.

Configure audit log delivery

As a Databricks account owner (or account admin, if you are on an E2 account), you can configure low-latency delivery of audit logs in JSON file format to an AWS S3 storage bucket, where you can make the data available for usage analysis. Databricks delivers a separate JSON file for each workspace in your account and a separate file for account-level events.

After initial setup or other log delivery configuration changes, expect a delay of up to one hour until changes take effect. After logging delivery begins, auditable events are typically logged within 15 minutes. For the file naming, delivery rules, and schema, see Audit delivery details and format.

The API to configure low-latency delivery of audit logs is Account API 2.0, which is the same API used to configure billable usage log delivery.

You can optionally deliver logs to an AWS account other than the account used for the IAM role that you create for log delivery. This allows flexibility, for example setting up workspaces from multiple AWS accounts to deliver to the same S3 bucket. This option requires that you configure an S3 bucket policy that references a cross-account IAM role. Instructions and a policy template are provided in this article.

Access to the logs depends on how you set up the S3 bucket. Databricks delivers logs to your S3 bucket with AWS’s built-in BucketOwnerFullControl Canned ACL so that account owners and designees can download the logs directly. To support bucket ownership for newly-created objects, you must set your bucket’s S3 Object Ownership setting to the value Bucket owner preferred.

Important

If instead you set your bucket’s S3 Object Ownership setting to Object writer, new objects such as your logs remain owned by the uploading account, which is by default the IAM role you created and specified to access your bucket. This can make it difficult to access the logs, because you cannot access them from the AWS console or automation tools that you authenticated with as the bucket owner.

Databricks recommends that you review Security Best Practices for S3 for guidance around protecting the data in your bucket from unwanted access.

Configuration options

To configure audit log delivery, you have the following options.

  • If you have one workspace in your Databricks account, follow the instructions in the sections that follow, creating a single configuration object with a common configuration for your workspace.
  • If you have multiple workspaces in the same Databricks account, you can do any of the following:
    • Share the same configuration (log delivery S3 bucket and IAM role) for all workspaces in the account. This is the only configuration option that also delivers account-level audit logs. It is the default option.
    • Use separate configurations for each workspace in the account.
    • Use separate configurations for different groups of workspaces, each sharing a configuration.
  • If you have multiple workspaces, each associated with a separate Databricks account, you must create unique storage and credential configuration objects for each account, but you can reuse an S3 bucket or IAM role between these configuration objects.

Note

Even though you use the Account API to configure log delivery, you can configure log delivery for any workspace, including workspaces that were not created using the Account API.

High-level flow

The high-level flow of audit log delivery:

  1. Configure storage: In AWS, create a new AWS S3 bucket. Using Databricks APIs, call the Account API to create a storage configuration object that uses the bucket name.

    Note

    To deliver logs to an AWS account other than the account used for the IAM role that you create for log delivery, you need to add an S3 bucket policy. You do not add the policy in this step.

  2. Configure credentials: In AWS, create the appropriate AWS IAM role. Using Databricks APIs, call the Account API to create a credentials configuration object that uses the IAM role’s ARN. The role policy can specify a path prefix for log delivery within your S3 bucket. You can choose to define an IAM role to include multiple path prefixes if you want log delivery configurations for different workspaces that share the S3 bucket but use different path prefixes.

  3. Optional cross-account support To deliver logs to an AWS account other than the account of the IAM role that you create for log delivery, add an S3 bucket policy. This policy references IDs for the cross-account IAM role that you created in the previous step.

  4. Call the log delivery API: Call the Account API to create a log delivery configuration that uses the credential and storage configuration objects from previous steps. This step lets you specify if you want to associate the log delivery configuration for all workspaces in your account (current and future workspaces) or for a specific set of workspaces. For a list of account-level events, see Audit events.

After you complete these steps, you can access the JSON files. The delivery location is:

<bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json

If you configure audit log delivery for the entire account, account-level audit events that are not associated with any single workspace are delivered to the workspaceId=0 partition.

New JSON files are delivered every few minutes, potentially overwriting existing files for each workspace. When you initially set up audit log delivery, it can take up to one hour for log delivery to begin. After audit log delivery begins, auditable events are typically logged within 15 minutes. Additional configuration changes typically take an hour to take effect.

For more information about accessing these files and analyzing them using Databricks, see Analyze audit logs.

Important

There is a limit on the number of log delivery configurations available per account (each limit applies separately to each log type including billable usage and audit logs). You can create a maximum of two enabled account-level delivery configurations (configurations without a workspace filter) per type. Additionally, you can create and enable two workspace level delivery configurations per workspace for each log type, meaning the same workspace ID can occur in the workspace filter for no more than two delivery configurations per log type. You cannot delete a log delivery configuration, but you can disable it. You can re-enable a disabled configuration, but the request fails if it violates the limits previously described.

Requirements

  • Account owner (or account admin, if you are on an E2 account) email address and password to authenticate with the APIs. The email address and password are both case sensitive.
  • Account ID. For accounts on the E2 version of the platform, get your account ID from Access the account console (E2) For non-E2 accounts, get account ID from your Usage Overview tab. Contact your Databricks representative if you cannot find your account ID.

How to authenticate to the APIs

The APIs described in this article are published on the accounts.cloud.databricks.com base endpoint for all AWS regional deployments.

Use the following base URL for API requests: https://accounts.cloud.databricks.com/api/2.0/

This REST API requires HTTP basic authentication, which involves setting the HTTP header Authorization. In this article, username refers to your account owner (or account admin, if you are on an E2 account) email address. The email address is case sensitive. There are several ways to provide your credentials to tools such as curl.

  • Pass your username and account password separately in the headers of each request in <username>:<password> syntax.

    For example:

    curl -X GET -u `<username>:<password>` -H "Content-Type: application/json" \
     'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/<endpoint>'
    
  • Apply base64 encoding to your <username>:<password> string and provide it directly in the HTTP header:

    curl -X GET -H "Content-Type: application/json" \
      -H 'Authorization: Basic <base64-username-pw>'
      'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/<endpoint>'
    
  • Create a .netrc file with machine, login, and password properties:

    machine accounts.cloud.databricks.com
    login <username>
    password <password>
    

    To invoke the .netrc file, use -n in your curl command:

    curl -n -X GET 'https://accounts.cloud.databricks.com/api/2.0/accounts/<account-id>/workspaces'
    

    This article’s examples use this authentication style.

For the complete API reference, see Account API 2.0.

Step 1: Configure storage

Databricks delivers the log to an S3 bucket in your account. You can configure multiple workspaces to use a single S3 bucket, or you can define different workspaces (or groups of workspaces) to use different buckets.

This procedure describes how to set up a single configuration object with a common configuration for one or more workspaces in the account. To use different storage locations for different workspaces, repeat the procedures in this article for each workspace or group of workspaces.

  1. Create the S3 bucket, following the instructions in Configure AWS storage.

    Important

    To deliver logs to an AWS account other than the one used for your Databricks workspace, you must add an S3 bucket policy. You do not add the bucket policy in this step. See Step 3: Optional cross-account support.

  2. Create a Databricks storage configuration record that represents your new S3 bucket. Specify your S3 bucket by calling the create new storage configuration API (POST /accounts/<account-id>/storage-configurations).

    Pass the following:

    • storage_configuration_name: New unique storage configuration name.
    • root_bucket_info: A JSON object that contains a bucket_name field that contains your S3 bucket name.

    Copy the storage_configuration_id value returned in the response body. You will use it to create the log delivery configuration in a later step.

    For example:

    curl -X POST -n \
        'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/storage-configurations' \
      -d '{
        "storage_configuration_name": "databricks-workspace-storageconf-v1",
        "root_bucket_info": {
          "bucket_name": "my-company-example-bucket"
        }
      }'
    

    Response:

    {
      "storage_configuration_id": "<databricks-storage-config-id>",
      "account_id": "<databricks-account-id>",
      "root_bucket_info": {
        "bucket_name": "my-company-example-bucket"
      },
      "storage_configuration_name": "databricks-workspace-storageconf-v1",
      "creation_time": 1579754875555
    }
    

Step 2: Configure credentials

This procedure describes how to set up a single configuration object with a common configuration for one or more workspaces in the account. To use different credentials for different workspaces, repeat the procedures in this article for each workspace or group of workspaces.

Note

To use different S3 bucket names, you need to create separate IAM roles.

  1. Log into your AWS Console as a user with administrator privileges and go to the IAM service.

  2. Click the Roles tab in the sidebar.

  3. Click Create role.

    1. In Select type of trusted entity, click AWS service.

    2. In Common Use Cases, click EC2.

    3. Click the Next: Permissions button.

    4. Click the Next: Tags button.

    5. Click the Next: Review button.

    6. In the Role name field, enter a role name.

      Role name
    7. Click Create role. The list of roles displays.

  4. In the list of roles, click the role you created.

  5. Add an inline policy.

    1. On the Permissions tab, click Add inline policy.

      Inline policy
    2. In the policy editor, click the JSON tab.

      JSON editor
    3. Copy this access policy and modify it. Replace the following values in the policy with your own configuration values:

      • <s3-bucket-name>: The bucket name of your AWS S3 bucket.
      • <s3-bucket-path-prefix>: (Optional) The path to the delivery location in the S3 bucket. If unspecified, the logs are delivered to the root of the bucket. This path must match the delivery_path_prefix argument when you call the log delivery API.
      {
        "Version":"2012-10-17",
        "Statement":[
          {
            "Effect":"Allow",
            "Action":[
              "s3:GetBucketLocation"
            ],
            "Resource":[
              "arn:aws:s3:::<s3-bucket-name>"
            ]
          },
          {
            "Effect":"Allow",
            "Action":[
              "s3:PutObject",
              "s3:GetObject",
              "s3:DeleteObject",
              "s3:PutObjectAcl",
              "s3:AbortMultipartUpload"
            ],
            "Resource":[
              "arn:aws:s3:::<s3-bucket-name>/<s3-bucket-path-prefix>/",
              "arn:aws:s3:::<s3-bucket-name>/<s3-bucket-path-prefix>/*"
            ]
          },
          {
            "Effect":"Allow",
            "Action":[
              "s3:ListBucket",
              "s3:ListMultipartUploadParts",
              "s3:ListBucketMultipartUploads"
            ],
            "Resource":"arn:aws:s3:::<s3-bucket-name>",
            "Condition":{
              "StringLike":{
                "s3:prefix":[
                  "<s3-bucket-path-prefix>",
                  "<s3-bucket-path-prefix>/*"
                ]
              }
            }
          }
        ]
      }
      

      You can customize the policy usage of the path prefix:

      • If you do not want to use the bucket path prefix, remove <s3-bucket-path-prefix>/ (including the final slash) from the policy each time it appears.
      • If you want log delivery configurations for different workspaces that share the S3 bucket but use different path prefixes, you can define an IAM role to include multiple path prefixes. There are two separate parts of the policy that reference <s3-bucket-path-prefix>. In each case, duplicate the two adjacent lines that reference the path prefix. Repeat each pair of lines for every new path prefix, for example:
      {
        "Resource":[
          "arn:aws:s3:::<mybucketname>/field-team/",
          "arn:aws:s3:::<mybucketname>/field-team/*",
          "arn:aws:s3:::<mybucketname>/finance-team/",
          "arn:aws:s3:::<mybucketname>/finance-team/*"
        ]
      }
      
    4. Click Review policy.

    5. In the Name field, enter a policy name.

    6. Click Create policy.

    7. If you use service control policies to deny certain actions at the AWS account level, ensure that sts:AssumeRole is whitelisted so Databricks can assume the cross-account role.

  6. On the role summary page, click the Trust Relationships tab.

  7. Paste this access policy into the editor and replace the following values in the policy with your own configuration values:

    <databricks-account-id> — Your Databricks account ID.

    {
      "Version":"2012-10-17",
      "Statement":[
        {
          "Effect":"Allow",
          "Principal":{
            "AWS":"arn:aws:iam::414351767826:role/SaasUsageDeliveryRole-prod-IAMRole-3PLHICCRR1TK"
          },
          "Action":"sts:AssumeRole",
          "Condition":{
            "StringEquals":{
              "sts:ExternalId":[
                "<databricks-account-id>"
              ]
            }
          }
        }
      ]
    }
    
  8. In the role summary, copy the Role ARN and save it for a later step.

    Role ARN
  9. Create a Databricks credentials configuration ID for your AWS role. Call the Create credential configuration API (POST /accounts/<account-id>/credentials). This request establishes cross-account trust and returns a reference ID to use when you create a new workspace.

    Replace <account-id> with your Databricks account ID. In the request body:

    • Set credentials_name to a name that is unique within your account.
    • Set aws_credentials to an object that contains an sts_role property. That object must specify the role_arn for the role you’ve created.

    The response body will include a credentials_id field, which is the Databricks credentials configuration ID that you need to create the new workspace. Copy this field so you can use it to create the log delivery configuration in a later step.

    For example:

     curl -X POST -n \
       'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/credentials' \
       -d '{
       "credentials_name": "databricks-credentials-v1",
       "aws_credentials": {
         "sts_role": {
           "role_arn": "arn:aws:iam::<aws-account-id>:role/my-company-example-role"
         }
       }
     }'
    

    Example response:

     {
       "credentials_id": "<databricks-credentials-id>",
       "account_id": "<databricks-account-id>",
       "aws_credentials": {
         "sts_role": {
           "role_arn": "arn:aws:iam::<aws-account-id>:role/my-company-example-role",
           "external_id": "<databricks-account-id>"
         }
       },
       "credentials_name": "databricks-credentials-v1",
       "creation_time": 1579753556257
     }
    

    Copy the credentials_id field from the response for later use.

Step 3: Optional cross-account support

If your S3 bucket is in the same AWS account as your IAM role used for log delivery, skip this step.

To deliver logs to an AWS account other than the one used for your Databricks workspace, you must add an S3 bucket policy, provided in this step. This policy references IDs for the cross-account IAM role that you created in the previous step.

  1. In the AWS Console, go to the S3 service.

  2. Click the bucket name.

  3. Click the Permissions tab.

  4. Click the Bucket Policy button.

    Bucket policy button
  5. Copy and modify this bucket policy.

    Replace <s3-bucket-name> with the S3 bucket name. Replace <customer-iam-role-id> with the role ID of your newly-created IAM role. Replace <s3-bucket-path-prefix> with the bucket path prefix you want. See the notes after the policy sample for information about customizing the path prefix.

     {
       "Version": "2012-10-17",
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": ["arn:aws:iam::<customer-iam-role-id>"]
               },
               "Action": "s3:GetBucketLocation",
               "Resource": "arn:aws:s3:::<s3-bucket-name>"
           },
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::<customer-iam-role-id>"
               },
               "Action": [
                   "s3:PutObject",
                   "s3:GetObject",
                   "s3:DeleteObject",
                   "s3:PutObjectAcl",
                   "s3:AbortMultipartUpload",
                   "s3:ListMultipartUploadParts"
               ],
               "Resource": [
                   "arn:aws:s3:::<s3-bucket-name>/<s3-bucket-path-prefix>/",
                   "arn:aws:s3:::<s3-bucket-name>/<s3-bucket-path-prefix>/*"
               ]
           },
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::<customer-iam-role-id>"
               },
               "Action": "s3:ListBucket",
               "Resource": "arn:aws:s3:::<s3-bucket-name>",
               "Condition": {
                   "StringLike": {
                       "s3:prefix": [
                           "<s3-bucket-path-prefix>",
                           "<s3-bucket-path-prefix>/*"
                       ]
                   }
               }
           }
       ]
     }
    

    You can customize the policy use of the path prefix:

    • If you do not want to use the bucket path prefix, remove <s3-bucket-path-prefix>/ (including the final slash) from the policy each time it appears.

    • If you want log delivery configurations for multiple workspaces that share the same S3 bucket but use different path prefixes, you can define an IAM role to include multiple path prefixes. Two parts of the policy reference <s3-bucket-path-prefix>. In each place, duplicate the two adjacent lines that reference the path prefix. Repeat each pair of lines for each new path prefix. For example:

      {
        "Resource":[
          "arn:aws:s3:::<mybucketname>/field-team/",
          "arn:aws:s3:::<mybucketname>/field-team/*",
          "arn:aws:s3:::<mybucketname>/finance-team/",
          "arn:aws:s3:::<mybucketname>/finance-team/*"
        ]
      }
      

Step 4: Call the log delivery API

To configure log delivery, call the Log delivery configuration API (POST /accounts/<account-id>/log-delivery).

You need the following values that you copied in the previous steps:

  • credentials_id: Your Databricks credential configuration ID, which represents your cross-account role credentials.
  • storage_configuration_id: Your Databricks storage configuration ID, which represents your root S3 bucket.

Also set the following fields:

  • log_type: Always set to AUDIT_LOGS.

  • output_format: Always set to JSON. For the schema, see Audit log schema.

  • delivery_path_prefix: (Optional) Set to the path prefix. This must match the path prefix that you used in your role policy. The delivery path is <bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json. If you configure audit log delivery for the entire account, account-level audit events that are not associated with any single workspace are delivered to the workspaceId=0 partition.

  • workspace_ids_filter: (Optional) Set to an array of workspace IDs whose logs you want to deliver. By default, the workspace filter field is empty and log delivery applies at the account level, delivering workspace-level logs for all workspaces in your account, plus account-level logs. You can optionally set this field to an array of workspace IDs (each one is an int64) to which log delivery should apply, in which case only workspace-level logs relating to the specified workspaces are delivered. For a list of account-level events, see Audit events.

    If you plan to use different log delivery configurations for different workspaces, set this field explicitly. Be aware that delivery configurations that mention specific workspaces won’t apply to new workspaces created in the future, and delivery won’t include account-level logs.

    For some types of Databricks deployments there is only one workspace per account ID, so this field is unnecessary.

Important

There is a limit on the number of log delivery configurations available per account (each limit applies separately to each log type, including billable usage and audit logs). You can create a maximum of two enabled account-level delivery configurations (configurations without a workspace filter) per type. Additionally, you can create two enabled workspace level delivery configurations per workspace for each log type, meaning the same workspace ID can occur in the workspace filter for no more than two delivery configurations per log type. You cannot delete a log delivery configuration, but you can disable it. You can re-enable a disabled configuration, but the request fails if it violates the limits previously described.

For example:

curl -X POST -n \
  'https://accounts.cloud.databricks.com/api/2.0/accounts/<databricks-account-id>/log-delivery' \
  -d '{
  "log_delivery_configuration": {
    "log_type": "AUDIT_LOGS",
    "config_name": "audit log config",
    "output_format": "JSON",
    "credentials_id": "<databricks-credentials-id>",
    "storage_configuration_id": "<databricks-storage-config-id>",
    "delivery_path_prefix": "auditlogs-data",
    "workspace_ids_filter": [
        6383650456894062,
        4102272838062927
    ]
    }
}'

Example response:

{
    "log_delivery_configuration": {
        "config_id": "<config-id>",
        "config_name": "audit log config",
        "log_type": "AUDIT_LOGS",
        "output_format": "JSON",
        "account_id": "<account-id>",
        "credentials_id": "<databricks-credentials-id>",
        "storage_configuration_id": "<databricks-storage-config-id>",
        "workspace_ids_filter": [
            6383650456894062,
            4102272838062927
        ],
        "delivery_path_prefix": "auditlogs-data",
        "status": "ENABLED",
        "creation_time": 1591638409000,
        "update_time": 1593108904000,
        "log_delivery_status": {
          "status": "CREATED",
          "message": "Log Delivery Configuration is successfully created. Status will be updated after the first delivery attempt."
        }
    }
}

Additional features of the log delivery APIs

The log delivery APIs have additional features. See the API reference documentation for details.

Additional operations include:

Log delivery configuration status can be found in the API response’s log_delivery_status object. With log_delivery_status, you can check the status (success or failure) and the last time of an attempt or successful delivery.

Important

There is a limit on the number of log delivery configurations available per account (each limit applies separately to each log type including billable usage and audit logs). You can create a maximum of two enabled account-level delivery configurations (configurations without a workspace filter) per type. Additionally, you can create two enabled workspace level delivery configurations per workspace for each log type, meaning the same workspace ID can occur in the workspace filter for no more than two delivery configurations per log type. You cannot delete a log delivery configuration, but you can disable it. You can re-enable a disabled configuration, but the request fails if it violates the limits previously described.

Audit delivery details and format

Once logging is enabled for your account, Databricks automatically starts sending audit logs in human-readable format to your delivery location on a periodic basis.

  • Latency: After initial setup or other configuration changes, expect some delay before your changes take effect. For initial setup of audit log delivery, it takes up to one hour for log delivery to begin. After log delivery begins, auditable events are typically logged within 15 minutes. Additional configuration changes typically take an hour to take effect.
  • Encryption: Databricks encrypts audit logs using Amazon S3 server-side encryption.
  • Format: Databricks delivers audit logs in JSON format.
  • Location: The delivery location is <bucket-name>/<delivery-path-prefix>/workspaceId=<workspaceId>/date=<yyyy-mm-dd>/auditlogs_<internal-id>.json. New JSON files are delivered every few minutes, potentially overwriting existing files. The delivery path is defined as part of the configuration. Account-level audit events that are not associated with any single workspace will be delivered to the workspaceId=0 partition, if you configured audit logs delivery for the entire account.
    • Databricks can overwrite the delivered log files in your bucket at any time. If a file is overwritten, the existing content remains, but there may be additional lines for more auditable events.
    • Overwriting ensures exactly-once semantics without requiring read or delete access to your account.

Audit log schema

The schema of audit log records is as follows. This section applies whether you configured audit log delivery using the current framework or the legacy framework (deprecated).

  • version: The schema version of the audit log format.
  • timestamp: UTC timestamp of the action.
  • workspaceId: ID of the workspace this event relates to. May be set to “0” for account-level events that apply to no workspace.
  • sourceIPAddress: The IP address of the source request.
  • userAgent: The browser or API client used to make the request.
  • sessionId: Session ID of the action.
  • userIdentity: Information about the user that makes the requests.
    • email: User email address.
  • serviceName: The service that logged the request.
  • actionName: The action, such as login, logout, read, write, and so on.
  • requestId: Unique request ID.
  • requestParams: Parameter key-value pairs used in the audited event.
  • response: Response to the request.
    • errorMessage: The error message if there was an error.
    • result: The result of the request.
    • statusCode: HTTP status code that indicates the request succeeds or not.
  • auditLevel: Specifies if this is a workspace-level event (WORKSPACE_LEVEL) or account-level event (ACCOUNT_LEVEL).
  • accountId: Account ID of this Databricks account.

Audit events

The serviceName and actionName properties identify an audit event in an audit log record. The naming convention follows the Databricks REST API reference. This section applies to both the current framework and the legacy framework (deprecated).

Workspace-level audit logs are available for these services:

  • accounts
  • clusters
  • dbfs
  • genie
  • globalInitScripts
  • groups
  • iamRole
  • instancePools
  • jobs
  • mlflowExperiment
  • notebook
  • repos
  • secrets
  • databrickssql
  • sqlPermissions, which has all the audit logs for table access when table ACLs are enabled.
  • ssh
  • workspace

Account-level audit logs are available for these services:

  • accountBillableUsage: Access to billable usage for the account.
  • logDelivery: Log delivery configuration for such as billable usage or audit logs.
  • ssoConfigBackend: Single sign-on settings for the account.
  • accountsManager: Actions performed in the accounts console.

Account-level events have the workspaceId field set to a valid workspace ID if they reference workspace-related events like creating or deleting a workspace. If they are not associated with any workspace, the workspaceId field is set to 0. Account level audit logs are delivered only for account-level delivery configurations with an empty workspace filter field (which deliver audit logs for all events in the account).

Note

  • If actions take a long time, the request and response are logged separately but the request and response pair have the same requestId.
  • With the exception of mount-related operations, Databricks audit logs do not include DBFS-related operations. We recommend that you set up server access logging in S3, which can log object-level operations associated with an IAM role. If you map IAM roles to Databricks users, your Databricks users cannot share IAM roles.
  • Automated actions—such as resizing a cluster due to autoscaling or launching a job due to scheduling—are performed by the user System-User.

Request parameters

The request parameters (field requestParams) for each supported service and action are listed in the following table:

Workspace-level audit log events

Service Action Request Parameters
accounts add [“targetUserName”, “endpoint”, “targetUserId”]
  addPrincipalToGroup [“targetGroupId”, “endpoint”, “targetUserId”, “targetGroupName”, “targetUserName”]
  changePassword [“newPasswordSource”, “targetUserId”, “serviceSource”, “wasPasswordChanged”, “userId”]
  createGroup [“endpoint”, “targetGroupId”, “targetGroupName”]
  delete [“targetUserId”, “targetUserName”, “endpoint”]
  garbageCollectDbToken [“tokenExpirationTime”, “userId”]
  generateDbToken [“userId”, “tokenExpirationTime”]
  jwtLogin [“user”]
  login [“user”]
  logout [“user”]
  removeAdmin [“targetUserName”, “endpoint”, “targetUserId”]
  removeGroup [“targetGroupId”, “targetGroupName”, “endpoint”]
  resetPassword [“serviceSource”, “userId”, “endpoint”, “targetUserId”, “targetUserName”, “wasPasswordChanged”, “newPasswordSource”]
  revokeDbToken [“userId”]
  samlLogin [“user”]
  setAdmin [“endpoint”, “targetUserName”, “targetUserId”]
  tokenLogin [“tokenId”, “user”]
  validateEmail [“endpoint”, “targetUserName”, “targetUserId”]
clusters changeClusterAcl [“shardName”, “aclPermissionSet”, “targetUserId”, “resourceId”]
  create [“cluster_log_conf”, “num_workers”, “enable_elastic_disk”, “driver_node_type_id”, “start_cluster”, “docker_image”, “ssh_public_keys”, “aws_attributes”, “acl_path_prefix”, “node_type_id”, “instance_pool_id”, “spark_env_vars”, “init_scripts”, “spark_version”, “cluster_source”, “autotermination_minutes”, “cluster_name”, “autoscale”, “custom_tags”, “cluster_creator”, “enable_local_disk_encryption”, “idempotency_token”, “spark_conf”, “organization_id”, “no_driver_daemon”, “user_id”]
  createResult [“clusterName”, “clusterState”, “clusterId”, “clusterWorkers”, “clusterOwnerUserId”]
  delete [“cluster_id”]
  deleteResult [“clusterWorkers”, “clusterState”, “clusterId”, “clusterOwnerUserId”, “clusterName”]
  edit [“spark_env_vars”, “no_driver_daemon”, “enable_elastic_disk”, “aws_attributes”, “driver_node_type_id”, “custom_tags”, “cluster_name”, “spark_conf”, “ssh_public_keys”, “autotermination_minutes”, “cluster_source”, “docker_image”, “enable_local_disk_encryption”, “cluster_id”, “spark_version”, “autoscale”, “cluster_log_conf”, “instance_pool_id”, “num_workers”, “init_scripts”, “node_type_id”]
  permanentDelete [“cluster_id”]
  resize [“cluster_id”, “num_workers”, “autoscale”]
  resizeResult [“clusterWorkers”, “clusterState”, “clusterId”, “clusterOwnerUserId”, “clusterName”]
  restart [“cluster_id”]
  restartResult [“clusterId”, “clusterState”, “clusterName”, “clusterOwnerUserId”, “clusterWorkers”]
  start [“init_scripts_safe_mode”, “cluster_id”]
  startResult [“clusterName”, “clusterState”, “clusterWorkers”, “clusterOwnerUserId”, “clusterId”]
dbfs addBlock [“handle”, “data_length”]
  create [“path”, “bufferSize”, “overwrite”]
  delete [“recursive”, “path”]
  getSessionCredentials [“mountPoint”]
  mkdirs [“path”]
  mount [“mountPoint”, “owner”]
  move [“dst”, “source_path”, “src”, “destination_path”]
  put [“path”, “overwrite”]
  unmount [“mountPoint”]
genie databricksAccess [“duration”, “approver”, “reason”, “authType”, “user”]
globalInitScripts create [“name”, “position”, “script-SHA256”, “enabled”]
  update [“script_id”, “name”, “position”, “script-SHA256”, “enabled”]
  delete [“script_id”]
groups addPrincipalToGroup [“user_name”, “parent_name”]
  createGroup [“group_name”]
  getGroupMembers [“group_name”]
  removeGroup [“group_name”]
iamRole changeIamRoleAcl [“targetUserId”, “shardName”, “resourceId”, “aclPermissionSet”]
instancePools changeInstancePoolAcl [“shardName”, “resourceId”, “targetUserId”, “aclPermissionSet”]
  create [“enable_elastic_disk”, “preloaded_spark_versions”, “idle_instance_autotermination_minutes”, “instance_pool_name”, “node_type_id”, “custom_tags”, “max_capacity”, “min_idle_instances”, “aws_attributes”]
  delete [“instance_pool_id”]
  edit [“instance_pool_name”, “idle_instance_autotermination_minutes”, “min_idle_instances”, “preloaded_spark_versions”, “max_capacity”, “enable_elastic_disk”, “node_type_id”, “instance_pool_id”, “aws_attributes”]
jobs cancel [“run_id”]
  cancelAllRuns [“job_id”]
  changeJobAcl [“shardName”, “aclPermissionSet”, “resourceId”, “targetUserId”]
  create [“spark_jar_task”, “email_notifications”, “notebook_task”, “spark_submit_task”, “timeout_seconds”, “libraries”, “name”, “spark_python_task”, “job_type”, “new_cluster”, “existing_cluster_id”, “max_retries”, “schedule”]
  delete [“job_id”]
  deleteRun [“run_id”]
  reset [“job_id”, “new_settings”]
  resetJobAcl [“grants”, “job_id”]
  runFailed [“jobClusterType”, “jobTriggerType”, “jobId”, “jobTaskType”, “runId”, “jobTerminalState”, “idInJob”, “orgId”]
  runNow [“notebook_params”, “job_id”, “jar_params”, “workflow_context”]
  runSucceeded [“idInJob”, “jobId”, “jobTriggerType”, “orgId”, “runId”, “jobClusterType”, “jobTaskType”, “jobTerminalState”]
  submitRun [“shell_command_task”, “run_name”, “spark_python_task”, “existing_cluster_id”, “notebook_task”, “timeout_seconds”, “libraries”, “new_cluster”, “spark_jar_task”]
  update [“fields_to_remove”, “job_id”, “new_settings”]
mlflowExperiment deleteMlflowExperiment [“experimentId”, “path”, “experimentName”]
  moveMlflowExperiment [“newPath”, “experimentId”, “oldPath”]
  restoreMlflowExperiment [“experimentId”, “path”, “experimentName”]
mlflowModelRegistry listModelArtifacts [“name”, “version”, “path”, “page_token”]
  getModelVersionSignedDownloadUri [“name”, “version”, “path”]
  createRegisteredModel [“name”, “tags”]
  deleteRegisteredModel [“name”]
  renameRegisteredModel [“name”, “new_name”]
  setRegisteredModelTag [“name”, “key”, “value”]
  deleteRegisteredModelTag [“name”, “key”]
  createModelVersion [“name”, “source”, “run_id”, “tags”, “run_link”]
  deleteModelVersion [“name”, “version”]
  getModelVersionDownloadUri [“name”, “version”]
  setModelVersionTag [“name”, “version”, “key”, “value”]
  deleteModelVersionTag [“name”, “version”, “key”]
  createTransitionRequest [“name”, “version”, “stage”]
  deleteTransitionRequest [“name”, “version”, “stage”, “creator”]
  approveTransitionRequest [“name”, “version”, “stage”, “archive_existing_versions”]
  rejectTransitionRequest [“name”, “version”, “stage”]
  transitionModelVersionStage [“name”, “version”, “stage”, “archive_existing_versions”]
  transitionModelVersionStageDatabricks [“name”, “version”, “stage”, “archive_existing_versions”]
  createComment [“name”, “version”]
  updateComment [“id”]
  deleteComment [“id”]
notebook attachNotebook [“path”, “clusterId”, “notebookId”]
  createNotebook [“notebookId”, “path”]
  deleteFolder [“path”]
  deleteNotebook [“notebookId”, “notebookName”, “path”]
  detachNotebook [“notebookId”, “clusterId”, “path”]
  downloadLargeResults [“notebookId”, “notebookFullPath”]
  downloadPreviewResults [“notebookId”, “notebookFullPath”]
  importNotebook [“path”]
  moveNotebook [“newPath”, “oldPath”, “notebookId”]
  renameNotebook [“newName”, “oldName”, “parentPath”, “notebookId”]
  restoreFolder [“path”]
  restoreNotebook [“path”, “notebookId”, “notebookName”]
  takeNotebookSnapshot [“path”]
repos createRepo [“url”, “provider”, “path”]
  updateRepo [“id”, “branch”, “tag”, “git_url”, “git_provider”]
  getRepo [“id”]
  listRepos [“path_prefix”, “next_page_token”]
  deleteRepo [“id”]
  pull [“id”]
  commitAndPush [“id”, “message”, “files”, “checkSensitiveToken”]
  checkoutBranch [“id”, “branch”]
  discard [“id”, “file_paths”]
secrets createScope [“scope”]
  deleteScope [“scope”]
  deleteSecret [“key”, “scope”]
  getSecret [“scope”, “key”]
  listAcls [“scope”]
  listSecrets [“scope”]
  putSecret [“string_value”, “scope”, “key”]
databrickssql createEndpoint  
  startEndpoint  
  stopEndpoint  
  deleteEndpoint  
  editEndpoint  
  changeEndpointAcls  
  setEndpointConfig  
  createQuery [“queryId”]
  updateQuery [“queryId”]
  forkQuery [“queryId”, “originalQueryId”]
  moveQueryToTrash [“queryId”]
  deleteQuery [“queryId”]
  restoreQuery [“queryId”]
  createDashboard [“dashboardId”]
  updateDashboard [“dashboardId”]
  moveDashboardToTrash [“dashboardId”]
  deleteDashboard [“dashboardId”]
  restoreDashboard [“dashboardId”]
  createAlert [“alertId”, “queryId”]
  updateAlert [“alertId”, “queryId”]
  deleteAlert [“alertId”]
  createVisualization [“visualizationId”, “queryId”]
  updateVisualization [“visualizationId”]
  deleteVisualization [“visualizationId”]
  changePermissions [“objectType”, “objectId”, “granteeAndPermission”]
  createExternalDatasource [“dataSourceId”, “dataSourceType”]
  updateExternalDatasource [“dataSourceId”]
  deleteExternalDatasource [“dataSourceId”]
  createAlertDestination [“alertDestinationId”, “alertDestinationType”]
  updateAlertDestination [“alertDestinationId”]
  deleteAlertDestination [“alertDestinationId”]
  createQuerySnippet [“querySnippetId”]
  updateQuerySnippet [“querySnippetId”]
  deleteQuerySnippet [“querySnippetId”]
  downloadQueryResult [“queryId”, “queryResultId”, “fileType”]
  changeDatabricksSqlAcl  
sqlPermissions createSecurable [“securable”]
  grantPermission [“permission”]
  removeAllPermissions [“securable”]
  requestPermissions [“requests”]
  revokePermission [“permission”]
  showPermissions [“securable”, “principal”]
ssh login [“containerId”, “userName”, “port”, “publicKey”, “instanceId”]
  logout [“userName”, “containerId”, “instanceId”]
workspace changeWorkspaceAcl [“shardName”, “targetUserId”, “aclPermissionSet”, “resourceId”]
  fileCreate [“path”]
  fileDelete [“path”]
  moveWorkspaceNode [“destinationPath”, “path”]
  purgeWorkspaceNodes [“treestoreId”]
  workspaceConfEdit [“workspaceConfKeys (values: enableResultsDownloading, enableExportNotebook)”, “workspaceConfValues”]
  workspaceExport [“workspaceExportFormat”, “notebookFullPath”]

Account level audit log events

Service Action Request Parameters
accountBillableUsage getAggregatedUsage [“account_id”, “window_size”, “start_time”, “end_time”, “meter_name”, “workspace_ids_filter”]
  getDetailedUsage [“account_id”, “start_month”, “end_month”, “with_pii”]
accounts login [“user”]
  gcpWorkspaceBrowserLogin [“user”]
  oidcBrowserLogin [“user”]
  logout [“user”]
accountsManager updateAccount [“account_id”, “account”]
  changeAccountOwner [“account_id”, “first_name”, “last_name”, “email”]
  consolidateAccounts [“target_account_id”, “account_ids_to_consolidate”]
  updateSubscription [“account_id”, “subscription_id”, “subscription”]
  listSubscriptions [“account_id”]
  createWorkspaceConfiguration [“workspace”]
  getWorkspaceConfiguration [“account_id”, “workspace_id”]
  listWorkspaceConfigurations [“account_id”]
  updateWorkspaceConfiguration [“account_id”, “workspace_id”]
  deleteWorkspaceConfiguration [“account_id”, “workspace_id”]
  acceptTos [“workspace_id”]
  sendTos [“account_id”, “workspace_id”]
  createCredentialsConfiguration [“credentials”]
  getCredentialsConfiguration [“account_id”, “credentials_id”]
  listCredentialsConfigurations [“”account_id””]
  deleteCredentialsConfiguration [“account_id”, “credentials_id”]
  createStorageConfiguration [“”storage_configuration””]
  getStorageConfiguration [“account_id”, “storage_configuration_id”]
  listStorageConfigurations [“account_id”]
  deleteStorageConfiguration [“account_id”, “storage_configuration_id”]
  createNetworkConfiguration [“network”]
  getNetworkConfiguration [“account_id”, “network_id”]
  listNetworkConfigurations [“account_id”]
  deleteNetworkConfiguration [“account_id”, “network_id”]
  createCustomerManagedKeyConfiguration [“”customer_managed_key””]
  getCustomerManagedKeyConfiguration [“account_id”, “customer_managed_key_id”]
  listCustomerManagedKeyConfigurations [“account_id”]
  deleteCustomerManagedKeyConfiguration [“account_id”, “customer_managed_key_id”]
  listWorkspaceEncryptionKeyRecords [“account_id”, “workspace_id”]
  listWorkspaceEncryptionKeyRecordsForAccount [“account_id”]
  createVpcEndpoint [“vpc_endpoint”]
  getVpcEndpoint [“account_id”, “vpc_endpoint_id”]
  listVpcEndpoints [“account_id”]
  deleteVpcEndpoint [“account_id”, “vpc_endpoint_id”]
  createPrivateAccessSettings [“private_access_settings”]
  getPrivateAccessSettings [“account_id”, “private_access_settings_id”]
  listPrivateAccessSettingss [“account_id”]
  deletePrivateAccessSettings [“account_id”, “private_access_settings_id”]
logDelivery createLogDeliveryConfiguration [“account_id”, “config_id”]
  updateLogDeliveryConfiguration [“config_id”, “account_id”, “status”]
  getLogDeliveryConfiguration [“log_delivery_configuration”]
  listLogDeliveryConfigurations [“account_id”, “storage_configuration_id”, “credentials_id”, “status”]
ssoConfigBackend create [“account_id”, “sso_type”, “config”]
  update [“account_id”, “sso_type”, “config”]
  get [“account_id”, “sso_type”]

Analyze audit logs

You can analyze audit logs using Databricks. The following example uses logs to report on Databricks access and Apache Spark versions. This applies to both the current framework and the legacy framework (deprecated), although the S3 delivery paths are different and the legacy logs are gzipped.

Load audit logs as a DataFrame and register the DataFrame as a temp table. See Amazon S3 for a detailed guide.

val df = spark.read.json("s3a://bucketName/path/to/auditLogs")
df.createOrReplaceTempView("audit_logs")

List the users who accessed Databricks and from where.

%sql
SELECT DISTINCT userIdentity.email, sourceIPAddress
FROM audit_logs
WHERE serviceName = "accounts" AND actionName LIKE "%login%"

Check the Apache Spark versions used.

%sql
SELECT requestParams.spark_version, COUNT(*)
FROM audit_logs
WHERE serviceName = "clusters" AND actionName = "create"
GROUP BY requestParams.spark_version

Check table data access.

%sql
SELECT *
FROM audit_logs
WHERE serviceName = "sqlPermissions" AND actionName = "requestPermissions"

Legacy audit log delivery (deprecated)

Deprecated

The legacy audit log delivery framework is deprecated. For the current audit log delivery framework, see Configure audit log delivery. Databricks strongly recommends that you migrate to the current audit log delivery framework, which supports low-latency delivery of an auditable event and account-level audit logs. To migrate, enable the audit log delivery framework, then disable the legacy audit log delivery configuration.

If your account is enabled for audit logging, the Databricks account owner configures where Databricks sends the logs. Admin users cannot configure audit log delivery.

  1. Log in to the account console.

  2. Click the Audit Logs tab.

  3. Configure the S3 bucket and directory:

    • S3 Bucket in <region name>: the S3 bucket where you want to store your audit logs. The bucket must exist.
    • Path: the path to the directory in the S3 bucket where you want to store the audit logs. For example, /databricks/auditlogs. If you want to store the logs at the bucket root, enter /.

    Databricks sends the audit logs to the specified S3 bucket and directory path, partitioned by date. For example, my-bucket/databricks/auditlogs/date=2018-01-15/part-0.json.gz.

Once logging is enabled for your account, Databricks automatically starts sending audit logs in human-readable format to your delivery location on a periodic basis. Logs are available within 72 hours of activation.

  • Encryption: Databricks encrypts audit logs using Amazon S3 server-side encryption.
  • Format: Databricks delivers audit logs in JSON files in gzip-compressed archives with file extension json.gz.
  • When: Databricks delivers audit logs daily and partitions the logs by date in yyyy-MM-dd format.
  • Other details:
    • Databricks delivers logs within 72 hours after day close.
    • Each audit log record is unique.

Note

  • Databricks can overwrite the delivered log files in your bucket at any time during the three-day period after the log date. After three days, audit files become immutable. In other words, logs for 2018-01-06 are subject to overwrites through 2018-01-09, and you can safely archive them on 2018-01-10.
  • Overwriting ensures exactly-once semantics without requiring read or delete access to your account.

Configure access policy (legacy)

To configure Databricks access to your AWS S3 bucket using an access policy, follow the steps in this section.

Step 1: Generate the access policy (legacy)

In the Databricks account console, on the Audit Logs tab:

  1. Click the Generate Policy button. The generated policy should look like:

     {
       "Version": "2012-10-17",
       "Id": "DatabricksAuditLogs",
       "Statement": [
         {
           "Sid": "PutAuditLogs",
           "Effect": "Allow",
           "Principal": {
             "AWS": "arn:aws:iam::090101015318:role/DatabricksAuditLogs-WriterRole-VV4KJWX4FRIK"
           },
           "Action": [
             "s3:PutObject"
           ],
           "Resource": "arn:aws:s3:::AUDIT_LOG_BUCKET/audit_log_path/*"
         },
         {
           "Sid": "DenyNotContainingFullAccess",
           "Effect": "Deny",
           "Principal": {
             "AWS": "arn:aws:iam::090101015318:role/DatabricksAuditLogs-WriterRole-VV4KJWX4FRIK"
           },
           "Action": [
             "s3:PutObject"
           ],
           "Resource": "arn:aws:s3:::AUDIT_LOG_BUCKET/audit_log_path/*",
           "Condition": {
             "StringNotEquals": {
               "s3:x-amz-acl": "bucket-owner-full-control"
             }
           }
         }
       ]
     }
    

    This policy ensures that the Databricks AWS account has write permission on the bucket and directory that you specified. The first section grants Databricks write permissions. Databricks does not have read, list, or delete permission. The second section ensures that you have full control over everything that Databricks writes to your bucket.

  2. Copy the generated JSON policy to your clipboard.

Step 2: Apply the policy to the AWS S3 bucket (legacy)

  1. In the AWS console, go to the S3 service.
  2. Click the name of the bucket where you want to store the audit logs.
  3. Click the Permissions tab.
  4. Click the Bucket Policy button.
  5. Paste the policy string from Step 1.
  6. Click Save.

Step 3: Verify that the policy is applied correctly (legacy)

In the Databricks account console, on the Audit Logs tab, click the Verify Access button.

Verify access

If you see a check mark check, audit logs are configured correctly. If verification fails:

  1. Check that you entered the bucket name correctly, and that the AWS region is correct.
  2. Check that you copied the generated policy correctly to AWS.
  3. Contact your AWS account admin.