Analyze billable usage log data

You can view billable usage directly in the account console.

You can also download or schedule delivery of billable usage logs in multiple ways:

In all these cases, the CSV file schema is the same.

You can use Databricks to parse and analyze these logs.

CSV file schema

Column Type Description Example
workspaceId string ID of the workspace. 1234567890123456
timestamp datetime End of the hour for the provided usage. 2019-02-22T09:59:59Z
clusterId string ID of the cluster. 0405-020048-brawl507
clusterName string User-provided name for the cluster. Shared Autoscaling
clusterNodeType string Instance type of the cluster. m4.16xlarge
clusterOwnerUserId string ID of the user who created the cluster. 12345678901234
clusterCustomTags string (“-escaped json) Custom tags associated with the cluster during this hour. "{""dept"":""mktg"",""op_phase"":""dev""}"
sku string Billing SKU. See the Billing SKUs table for a list of values. STANDARD_ALL_PURPOSE_COMPUTE
dbus double Number of DBUs used by the user during this hour. 1.2345
machineHours double Total number of machine hours used by all containers in the cluster. 12.345
clusterOwnerUserName string Username (email) of the user who created the cluster. user@yourcompany.com
tags string (“-escaped json) Default and custom cluster tags, and default and custom instance pool tags (if applicable) associated with the cluster during this hour. See Cluster tags and Pool tags. This is a superset of the clusterCustomTags column. "{""dept"":""mktg"",""op_phase"":""dev"", ""Vendor"":""Databricks"", ""ClusterId"":""0405-020048-brawl507"", ""Creator"":""user@yourcompany.com""}"

Billing SKUs

  • ENTERPRISE_ALL_PURPOSE_COMPUTE
  • ENTERPRISE_JOBS_COMPUTE
  • ENTERPRISE_JOBS_LIGHT_COMPUTE
  • ENTERPRISE_SQL_COMPUTE
  • PREMIUM_ALL_PURPOSE_COMPUTE
  • PREMIUM_JOBS_COMPUTE
  • PREMIUM_JOBS_LIGHT_COMPUTE
  • PREMIUM_SQL_COMPUTE
  • STANDARD_ALL_PURPOSE_COMPUTE
  • STANDARD_JOBS_COMPUTE
  • STANDARD_JOBS_LIGHT_COMPUTE

Usage billed before March 2020 uses the following SKUs:

  • LIGHT_AUTOMATED_NON_OPSEC
  • LIGHT_AUTOMATED_OPSEC
  • STANDARD_AUTOMATED_NON_OPSEC
  • STANDARD_AUTOMATED_OPSEC
  • STANDARD_INTERACTIVE_NON_OPSEC
  • STANDARD_INTERACTIVE_OPSEC

Analyze usage data in Databricks

This section describes how to make the data in the billable usage CSV file available to Databricks for analysis. It describes options for creating a usage table and includes a sample notebook that you can use to run a usage analysis dashboard.

The CSV file uses a format that is standard for commercial spreadsheet applications but requires a modification to be read by Apache Spark. You must use option("escape", "\"") when you create the usage table in Databricks.

Total DBUs are the sum of the dbus column.

Import the log using the Create Table UI

You can use the Create New Table UI to import the CSV file into Databricks for analysis.

Create a Spark DataFrame

You can also use the following code to create the usage table from a path to the CSV file:

df = (spark.
      read.
      option("header", "true").
      option("inferSchema", "true").
      option("escape", "\"").
      csv("/FileStore/tables/usage_data.csv"))

df.createOrReplaceTempView("usage")

If the file is stored in an S3 bucket, for example when it is used with log delivery, the code will look like the following. You can specify a file path or a directory. If you pass a directory, all files are imported. The following example specifies a file.

df = (spark.
      read.
      option("header", "true").
      option("inferSchema", "true").
      option("escape", "\"").
      load("s3://<bucketname>/<pathprefix>/billable-usage/csv/workspaceId=<workspace-id>-usageMonth=<month>.csv"))

df.createOrReplaceTempView("usage")

The following example imports a directory of billable usage files:

df = (spark.
      read.
      option("header", "true").
      option("inferSchema", "true").
      option("escape", "\"").
      load("s3://<bucketname>/<pathprefix>/billable-usage/csv/"))

df.createOrReplaceTempView("usage")

Create a Delta table

To create a Delta table from the DataFrame (df) in the previous example, use the following code:

(df.write
    .format("delta")
    .mode("overwrite")
    .saveAsTable("database_name.table_name")
)

Warning

The saved Delta table is not updated automatically when you add or replace new CSV files. If you need the latest data, re-run these commands before you use the Delta table.

Usage analysis dashboard notebook

If you use billable usage delivery, you can use the following notebook to run a usage analysis dashboard by providing a path to the S3 bucket where your CSV files are stored and entering report parameters in a widget.

The widget that you use to enter report parameters appears above the first notebook cell when you import the notebook to your Databricks workspace. The widget does not appear in the browser-only view of the notebook. Here’s an image of the widget:

usage dashboard widget

Usage analysis dashboard notebook

Open notebook in new tab