Skip to main content

Billable usage system table reference

This article provides an overview of the billable usage system table, including the schema and example queries. With system tables, your account's billable usage data is centralized and routed to all regions, so you can view your account's global usage from whichever region your workspace is in.

For information on using this table to monitor costs and sample queries, see Monitor costs using system tables.

Table path: This system table is located at system.billing.usage.

Billable usage table schema

The billable usage system table uses the following schema:

Column name

Data type

Description

Example

record_id

string

Unique ID for this usage record

11e22ba4-87b9-4cc2-9770-d10b894b7118

account_id

string

ID of the account this report was generated for

23e22ba4-87b9-4cc2-9770-d10b894b7118

workspace_id

string

ID of the workspace this usage was associated with

1234567890123456

sku_name

string

Name of the SKU

STANDARD_ALL_PURPOSE_COMPUTE

cloud

string

Cloud associated with this usage. Possible values are AWS, AZURE, and GCP.

AWS, AZURE, or GCP

usage_start_time

timestamp

The start time relevant to this usage record. Timezone information is recorded at the end of the value with +00:00 representing UTC timezone.

2023-01-09 10:00:00.000+00:00

usage_end_time

timestamp

The end time relevant to this usage record. Timezone information is recorded at the end of the value with +00:00 representing UTC timezone.

2023-01-09 11:00:00.000+00:00

usage_date

date

Date of the usage record, this field can be used for faster aggregation by date

2023-01-01

custom_tags

map

Custom tags associated with the usage record

{ “env”: “production” }

usage_unit

string

Unit this usage is measured in

DBU

usage_quantity

decimal

Number of units consumed for this record

259.2958

usage_metadata

struct

System-provided metadata about the usage, including IDs for compute resources and jobs (if applicable). See Usage Metadata.

See Usage metadata

identity_metadata

struct

System-provided metadata about the identities involved in the usage. See Identity Metadata.

See Identity metadata

record_type

string

Whether the record is original, a retraction, or a restatement. The value is ORIGINAL unless the record is related to a correction. See Record Type.

ORIGINAL

ingestion_date

date

Date the record was ingested into the usage table

2024-01-01

billing_origin_product

string

The product that originated the usage. Some products can be billed as different SKUs. For possible values, see Product.

JOBS

product_features

struct

Details about the specific product features used. See Product features.

See Product features

usage_type

string

The type of usage attributed to the product or workload for billing purposes. Possible values are COMPUTE_TIME, STORAGE_SPACE, NETWORK_BYTE, NETWORK_HOUR, API_OPERATION, TOKEN, or GPU_TIME.

STORAGE_SPACE

Usage metadata reference

The values in usage_metadata are all strings that tell you about the workspace objects and resources involved in the usage record.

Only a subset of these values is populated in any given usage record, depending on the compute type and features used. The third column in the table shows which usage types cause each value to be populated.

Value

Description

Populated for (otherwise null)

cluster_id

ID of the cluster associated with the usage record

Non-serverless compute usage, including notebooks, jobs, Lakeflow Declarative Pipelines, and legacy model serving

job_id

ID of the job associated with the usage record

Serverless jobs and jobs run on job compute (does not populate for jobs run on all-purpose compute)

warehouse_id

ID of the SQL warehouse associated with the usage record

Workloads run on a SQL warehouse

instance_pool_id

ID of the instance pool associated with the usage record

Non-serverless compute usage from pools, including notebooks, jobs, Lakeflow Declarative Pipelines, and legacy model serving

node_type

The instance type of the compute resource

Non-serverless compute usage, including notebooks, jobs, Lakeflow Declarative Pipelines, and all SQL warehouses

job_run_id

ID of the job run associated with the usage record

Serverless jobs and jobs run on job compute (does not populate for jobs run on all-purpose compute)

notebook_id

ID of the notebook associated with the usage

Serverless notebooks

dlt_pipeline_id

ID of the pipeline associated with the usage record

Lakeflow Declarative Pipelines and features that use Lakeflow Declarative Pipelines, such as materialized views, online tables, vector search indexing, and Lakeflow Connect

endpoint_name

The name of the model serving endpoint or vector search endpoint associated with the usage record

Model serving and Vector Search

endpoint_id

ID of the model serving endpoint or vector search endpoint associated with the usage record

Model serving and Vector Search

dlt_update_id

ID of the pipeline's update cluster associated with the usage record

Lakeflow Declarative Pipelines and features that use Lakeflow Declarative Pipelines, such as materialized views, online tables, vector search indexing, and Lakeflow Connect

dlt_maintenance_id

ID of the pipeline's maintenance cluster associated with the usage record

Lakeflow Declarative Pipelines and features that use Lakeflow Declarative Pipelines, such as materialized views, online tables, vector search indexing, and Lakeflow Connect

metastore_id

This value is not populated in Databricks on GCP

Always null

run_name

Unique user-facing name of the Foundation Model Fine-tuning run associated with the usage record

Foundation Model Fine-tuning

job_name

User-given name of the job associated with the usage record

Jobs run on serverless compute

notebook_path

Workspace storage path of the notebook associated with the usage

Notebooks run on serverless compute

central_clean_room_id

ID of the central clean room associated with the usage record

Clean Rooms

source_region

This value is not populated in Databricks on GCP

Always null

destination_region

This value is not populated in Databricks on GCP

Always null

app_id

ID of the app associated with the usage record

Databricks Apps

app_name

User-given name of the app associated with the usage record

Databricks Apps

private_endpoint_name

This value is not populated in Databricks on GCP

Always null

budget_policy_id

ID of the serverless budget policy attached to the workload

Serverless compute usage, including notebooks, jobs, Lakeflow Declarative Pipelines, and model serving endpoints

storage_api_type

The type of operation performed on default storage. Possible values are TIER_1 (PUT, COPY, POST, LIST) and TIER_2 (other operations)

Default storage

ai_runtime_workload_id

This value is not populated in Databricks on GCP

Always null

uc_table_catalog

The Unity Catalog catalog name associated with the usage record

Materialized views

uc_table_schema

The Unity Catalog schema name associated with the usage record

Materialized views

uc_table_name

The Unity Catalog table name associated with the usage record

Materialized views

database_instance_id

ID of the database instance associated with the usage record

Lakebase database instances

sharing_materialization_id

ID of the sharing materialization associated with the usage record

View sharing, materialized views, and streaming tables using Delta Sharing

usage_policy_id

ID of the usage policy associated with the usage record

Usage policies

base_environment_id

ID of the base environment associated with the usage

Usage from building or refreshing a workspace's serverless base environment. Populated when billing_origin_product is BASE_ENVIRONMENTS.

Identity metadata reference

The identity_metadata column provides more information about the identities involved in the usage.

  • The run_as field logs who ran the workload. This values is only populated for certain workload types listed in the table below.
  • The owned_by field only applies to SQL warehouse usage and logs the user or service principal who owns the SQL warehouse responsible for the usage.

run_as identities

The identity recorded in identity_metadata.run_as depends on the product associated with the usage. Reference the following table for the identity_metadata.run_as behavior:

Workload type

Identity of run_as

Jobs compute

The user or service principal defined in the run_as setting. By default, jobs run as the identity of the job owner, but admins can change this to be another user or service principal.

Serverless compute for jobs

The user or service principal defined in the run_as setting. By default, jobs run as the identity of the job owner, but admins can change this to be another user or service principal.

Serverless compute for notebooks

The user who ran the notebook commands (specifically, the user who created the notebook session). For shared notebooks, this includes usage by other users sharing the same notebook session.

Lakeflow Declarative Pipelines

The user or service principal whose permissions are used to run the pipeline. This can be changed by transferring the pipeline's ownership.

Predictive optimization

The Databricks-owned service principal that runs predictive optimization operations.

Lakehouse monitoring

The user who created the monitor.

Record type reference

The billing.usage table supports corrections. Corrections occur when any field of the usage record is incorrect and must be fixed.

When a correction happens, Databricks adds two new records to the table. A retraction record negates the original incorrect record, then a restatement record includes the corrected information. Correction records are identified using the record_type field:

  • RETRACTION: Used to negate the original incorrect usage. All fields are identical to the ORIGINAL record except usage_quantity, which is a negative value that cancels out the original usage quantity. For example, if the original record's usage quantity was 259.4356, then the retraction record would have a usage quantity of -259.4356.
  • RESTATEMENT: The record that includes the correct fields and usage quantity.

For example, the following query returns the correct hourly usage quantity related to a job_id, even if corrections have been made. By aggregating the usage quantity, the retraction record negates the original record and only the restatement's values are returned.

SQL
SELECT
usage_metadata.job_id, usage_start_time, usage_end_time,
SUM(usage_quantity) as usage_quantity
FROM system.billing.usage
GROUP BY ALL
HAVING usage_quantity != 0
note

For corrections where the original usage record should not have been written, a correction may only add a retraction record and no restatement record.

Billing origin product reference

Some Databricks products are billed under the same shared SKU. For example, data quality monitoring, predictive optimization, and serverless workflows are all billed under the same serverless jobs SKU.

To help you differentiate usage, the billing_origin_product and product_features columns provide more insight into the specific product and features associated with the usage.

The billing_origin_product column shows the Databricks product associated with the usage record. The values include:

Value

Description

JOBS

Costs associated with Lakeflow Jobs workloads

DLT

Costs associated with Lakeflow Declarative Pipelines workloads

SQL

Costs associated with Databricks SQL, including workloads run on SQL warehouses and materialized views

ALL_PURPOSE

Costs associated with classic all-purpose compute

MODEL_SERVING

Costs associated with Mosaic AI Model Serving

INTERACTIVE

Costs associated with serverless interactive workloads

DEFAULT_STORAGE

Costs associated with default storage

VECTOR_SEARCH

Costs associated with Vector Search

LAKEHOUSE_MONITORING

Costs associated with Data Quality Monitoring

PREDICTIVE_OPTIMIZATION

Costs associated with predictive optimization

AGENT_EVALUATION

Costs associated with agent evaluation

FINE_GRAINED_ACCESS_CONTROL

Serverless usage from fine-grained access control on dedicated compute

BASE_ENVIRONMENTS

Usage associated with building or refreshing a workspace's serverless base environment

DATA_CLASSIFICATION

Costs associated with data classification operations

AI_GATEWAY

Costs associated with AI Gateway usage

Product features reference

The product_features column is an object containing information about the specific product features used and includes the following key/value pairs:

Field

Description

jobs_tier

Values include LIGHT, CLASSIC, or null

sql_tier

Values include CLASSIC, PRO, or null

dlt_tier

Values include CORE, PRO, ADVANCED, or null

is_serverless

Whether the workload uses serverless compute. Values include true or false, or null (value is true or false when you can choose between serverless and classic compute, otherwise it's null)

is_photon

Whether Photon is enabled for the workload. Values include true or false, or null

serving_type

Values include MODEL, GPU_MODEL, FOUNDATION_MODEL, FEATURE, or null

offering_type

Values include BATCH_INFERENCE or null

performance_target

Indicates the performance mode of the serverless job or pipeline. Values include PERFORMANCE_OPTIMIZED, STANDARD, or null. Non-serverless workloads have a null value.

model_serving.offering_type

Indicates the offering type for model serving or null

ai_gateway.feature_type

Indicates the feature type for AI Gateway workloads or null

serverless_gpu.workload_type

Not supported on Databricks on GCP

ai_runtime.compute_type

Not supported on Databricks on GCP

ai_functions.ai_function

Indicates the AI function type or null