View billable usage

The Usage Overview tab in the Databricks Account Console lets you:

  • View historical account usage in Databricks Units (DBUs), grouped by workload type (All-Purpose Compute, Jobs Compute, Jobs Compute Light).

  • Download a CSV file that contains itemized usage details by cluster. If you want to automate delivery of these files to an S3 bucket, see Configure billable usage log delivery.

    Note

    The downloadable CSV file includes personal data. As always, handle it with care.

  • View your Databricks account ID.

View the usage graph

  1. Log into the Account Console. See Access the Account Console.

  2. Click the Usage Overview tab.

  3. Select a <month year> to see historical account usage.

    Account usage

Download usage as a CSV file

To get a CSV file containing detailed usage data, click the Download itemized usage button. The file uses this schema:

Column Type Description Example
workspaceId string ID of the workspace. 1234567890123456
timestamp datetime End of the hour for the provided usage. 2019-02-22T09:59:59Z
clusterId string ID of the cluster. 0405-020048-brawl507
clusterName string User-provided name for the cluster. Shared Autoscaling
clusterNodeType string Instance type of the cluster. m4.16xlarge
clusterOwnerUserId string ID of the user who created the cluster. 12345678901234
clusterCustomTags string (“-escaped json) Custom tags associated with the cluster during this hour. "{""dept"":""mktg"",""op_phase"":""dev""}"
sku string Billing SKU. See the Billing SKUs table for a list of values. STANDARD_ALL_PURPOSE_COMPUTE
dbus double Number of DBUs used by the user during this hour. 1.2345
machineHours double Total number of machine hours used by all containers in the cluster. 12.345
clusterOwnerUserName string Username (email) of the user who created the cluster. user@yourcompany.com
tags string (“-escaped json) Default and custom cluster tags, and default and custom instance pool tags (if applicable) associated with the cluster during this hour. See Cluster tags and Pool tags. This is a superset of the clusterCustomTags column. "{""dept"":""mktg"",""op_phase"":""dev"", ""Vendor"":""Databricks"", ""ClusterId"":""0405-020048-brawl507"", ""Creator"":""user@yourcompany.com""}"

Billing SKUs

Before March 2020 Starting March 2020
STANDARD_INTERACTIVE_NON_OPSEC STANDARD_ALL_PURPOSE_COMPUTE
STANDARD_AUTOMATED_NON_OPSEC STANDARD_JOBS_COMPUTE
LIGHT_AUTOMATED_NON_OPSEC STANDARD_JOBS_LIGHT_COMPUTE
STANDARD_INTERACTIVE_OPSEC PREMIUM_ALL_PURPOSE_COMPUTE
STANDARD_AUTOMATED_OPSEC PREMIUM_JOBS_COMPUTE
LIGHT_AUTOMATED_OPSEC PREMIUM_JOBS_LIGHT_COMPUTE
N/A ENTERPRISE_ALL_PURPOSE_COMPUTE
N/A ENTERPRISE_JOBS_COMPUTE
N/A ENTERPRISE_JOBS_LIGHT_COMPUTE

You can import this file into Databricks for analysis.

Deliver billable usage logs to your own S3 bucket

You can configure automatic delivery of billable usage CSV files into an S3 bucket in your AWS account. See Configure billable usage log delivery.

Automatic log delivery allows you to control sharing of these details with other users. Other users who are authorized for this data do not need to rely on the account owner to regularly navigate to the Account Console to download a CSV file to share.

You can import the files into Databricks for analysis.

Import usage data for analysis in Databricks

You can use the Create New Table UI to import the CSV file into Databricks for analysis.

Total DBUs are the sum of the dbus column.

The CSV file uses a format that is standard for commercial spreadsheet applications but requires a modification to be read by Apache Spark. You must use option("escape", "\"") when you create the usage table in Databricks.

Create Spark data frames

You can also use the following code to create the usage table from a path to the CSV file:

df = (spark.
      read.
      option("header", "true").
      option("inferSchema", "true").
      option("escape", "\"").
      csv("/FileStore/tables/usage_data.csv"))

df.createOrReplaceTempView("usage")

If the file is stored in an S3 bucket, for example when used with log delivery, the code would look like the following. Note that you can specify a file path or a directory. If you pass a directory, all files are imported. The following example specifies a specific file.

df = (spark.
      read.
      option("header", "true").
      option("inferSchema", "true").
      option("escape", "\"").
      .load("s3://<bucketname>/<pathprefix>/billable-usage/csv/workspaceId=<workspace-id>-usageMonth=<month>.csv")

df.createOrReplaceTempView("usage")

The following example imports a directory of billable usage files:

df = (spark.
      read.
      option("header", "true").
      option("inferSchema", "true").
      option("escape", "\"").
      .load("s3://<bucketname>/<pathprefix>/billable-usage/csv/")

df.createOrReplaceTempView("usage")

Create a Delta Table

To create a Delta Table from the data frame (df) in the previous example, use the following code:

(df.write
.format("delta")
.mode("overwrite")
.saveAsTable("database_name.table_name")
)

Warning

The saved Delta Table will not automatically be updated when new CSV files are added or replaced. If you need the latest data, re-run these commands before using the Delta Table.