Skip to main content

Use tags to attribute and track usage

This article explains how to use tags to attribute compute usage to specific workspaces, teams, projects, or users to support cost tracking and budgeting.

There are two types of tags:

  • Default tags: Automatically applied by Databricks to cloud-deployed resources. These provide basic metadata like vendor, cluster ID, and creator.
  • Custom tags: User-defined tags that you can add to compute resources and serverless workloads. These allow for granular tracking, reporting, and budgeting.
warning

Tag data may be replicated globally. Do not use tag names or values that could compromise the security of your resources. For example, do not use tag names that contain personal or sensitive information.

Default tags

Databricks automatically adds default tags to compute resources it deploys in your cloud account. These tags attribute the usage to Databricks and provide basic information about the resource, such as its name, ID, and creator.

Default tags and tag keys automatically propagate to labels on GCE resources such as VM and its persistent disks.

Default tag keys and values

Databricks adds the following default tags to compute resources:

Tag key

Value

Vendor

Constant value: Databricks

ClusterId

Databricks internal ID of the cluster

ClusterName

Name of the cluster

Creator

Username (email address) of the user who created the cluster

RunName

Job name (only propagates on jobs compute)

JobId

Job ID (only propagates on jobs compute)

For tag keys and values that are propagated to GCE resources, letters are converted to lowercase. Characters are removed if they are not letters, numbers, underscores, or dashes. The creator's email address with @ is replaced with _at_. For example, X+Y@databricks.com becomes xy_at_databricks.com.

Databricks adds the following default tags to pools and the compute resources created by pools.

Tag key

Value

Vendor

Constant value: Databricks

DatabricksInstancePoolCreatorId

Databricks internal ID of the user who created the pool

DatabricksInstancePoolId

Databricks internal ID of the pool

Custom tags

Custom tags let you attribute compute usage to specific teams, projects, or cost centers with more granularity than default tags. These tags are applied by users or admins and propagate to both your account's usage logs and applicable cloud resources. These tags are also used to create and monitor budgets in your Databricks account.

Supported resources for custom tags

You can add custom tags for the following objects managed by Databricks:

Object

Tagging interface (UI)

Tagging interface (API)

Pool

Pools UI in the Databricks workspace

Instance Pool API

All-purpose and job compute

Compute UI in the Databricks workspace

Clusters API

SQL warehouse

SQL warehouse UI in the Databricks workspace

Warehouses API

Custom tags appear in lowercase in GCE logs. Characters are removed if they are not letters, numbers, underscores or dashes. For example, My Key becomes mykey and My.Val becomes myval.

warning

Do not assign a custom tag with the key Name to a cluster. Every cluster has a tag Name whose value is set by Databricks. If you change the value associated with the key Name, the cluster can no longer be tracked by Databricks. As a consequence, the cluster might not be terminated after becoming idle and will continue to incur usage costs.

Tag serverless compute workloads

Preview

This feature is in Public Preview.

To attribute serverless compute usage to users, groups, or projects, you can use serverless budget policies. When a user is assigned a serverless budget policy, their serverless usage is automatically tagged with their policy's custom tags. Serverless budget policies can be applied to serverless notebooks, jobs, pipelines, and model serving endpoints.

note

Serverless compute usage is logged in your account's billable usage system table. The legacy DBU usage reports do not include serverless usage or serverless budget policy tags.

See Attribute usage with serverless budget policies.

Tag propagation

You can use cluster and pool tags to aggregate and analyze costs. These tags propagate in the following ways:

How tags propagate for clusters created from pools

Tags propagate to node instances differently depending on whether or not a cluster was created from a pool.

  • If a cluster is not created from a pool, its tags propagate as expected to node instances.
  • If a cluster is created from a pool, its instances inherit both the pool tags and the cluster tags. The pool's tags are used directly for VM usage data only for the idle VMs.
  • If there is a tag name conflict, Databricks default tags take precedence over custom tags, and pool tags take precedence over cluster tags.

Tag enforcement

To enforce the use of specific custom tags, you can use compute policies. See Custom tag enforcement. To enforce custom tags on serverless compute workloads, use serverless budget policies.

Limitations

  • Tag keys and values can only contain letters, spaces, numbers, or the characters +, -, =, ., _, :, /, @. Tags containing other characters are invalid.
  • If you change tag keys names or values, these changes apply only after cluster restart or pool expansion.
  • The maximum custom number of tags that can propagate to GCE labels is 54.
  • The maximum length for GCE label keys and values is 63 characters.
  • Label propagation can be delayed due to GCE API rate limits for the project. You can resolve this by increasing the GCE API rate limits for the Google Cloud project.

GCE label limits

For GCE labels, there are limitations:

  • Keys and values must consist only of lowercase letters, numeric characters, underscores, and dashes.
  • Maximum length for GCE label keys and values is 63 characters.
  • Maximum number of tags that can propagate to GCE labels is 54.

To conform to GCE format rules, tags are transformed before they become GCE label keys and values. If there are duplicates after transformation, the pair of keys and values that appear later (lower) in tag definitions are the ones that persist.