Use tags to attribute and track usage
This article explains how to use tags to attribute compute usage to specific workspaces, teams, projects, or users to support cost tracking and budgeting.
There are two types of tags:
- Default tags: Automatically applied by Databricks to cloud-deployed resources. These provide basic metadata like vendor, cluster ID, and creator.
- Custom tags: User-defined tags that you can add to compute resources and serverless workloads. These allow for granular tracking, reporting, and budgeting.
Tag data may be replicated globally. Do not use tag names or values that could compromise the security of your resources. For example, do not use tag names that contain personal or sensitive information.
Default tags
Databricks automatically adds default tags to compute resources it deploys in your cloud account. These tags attribute the usage to Databricks and provide basic information about the resource, such as its name, ID, and creator.
Default tags and tag keys automatically propagate to labels on GCE resources such as VM and its persistent disks.
Default tag keys and values
Databricks adds the following default tags to compute resources:
Tag key | Value |
---|---|
| Constant value: |
| Databricks internal ID of the cluster |
| Name of the cluster |
| Username (email address) of the user who created the cluster |
| Job name (only propagates on jobs compute) |
| Job ID (only propagates on jobs compute) |
For tag keys and values that are propagated to GCE resources, letters are converted to lowercase. Characters are removed if they are not letters, numbers, underscores, or dashes. The creator's email address with @
is replaced with _at_
. For example, X+Y@databricks.com
becomes xy_at_databricks.com
.
Databricks adds the following default tags to pools and the compute resources created by pools.
Tag key | Value |
---|---|
| Constant value: |
| Databricks internal ID of the user who created the pool |
| Databricks internal ID of the pool |
Custom tags
Custom tags let you attribute compute usage to specific teams, projects, or cost centers with more granularity than default tags. These tags are applied by users or admins and propagate to both your account's usage logs and applicable cloud resources. These tags are also used to create and monitor budgets in your Databricks account.
Supported resources for custom tags
You can add custom tags for the following objects managed by Databricks:
Object | Tagging interface (UI) | Tagging interface (API) |
---|---|---|
Pool | Pools UI in the Databricks workspace | |
All-purpose and job compute | Compute UI in the Databricks workspace | |
SQL warehouse | SQL warehouse UI in the Databricks workspace |
Custom tags appear in lowercase in GCE logs. Characters are removed if they are not letters, numbers, underscores or dashes. For example, My Key
becomes mykey
and My.Val
becomes myval
.
Do not assign a custom tag with the key Name
to a cluster. Every cluster has a tag Name
whose value is set by Databricks. If you change the value associated with the key Name
, the cluster can no longer be tracked by Databricks. As a consequence, the cluster might not be terminated after becoming idle and will continue to incur usage costs.
Tag serverless compute workloads
This feature is in Public Preview.
To attribute serverless compute usage to users, groups, or projects, you can use serverless budget policies. When a user is assigned a serverless budget policy, their serverless usage is automatically tagged with their policy's custom tags. Serverless budget policies can be applied to serverless notebooks, jobs, pipelines, and model serving endpoints.
Serverless compute usage is logged in your account's billable usage system table. The legacy DBU usage reports do not include serverless usage or serverless budget policy tags.
See Attribute usage with serverless budget policies.
Tag propagation
You can use cluster and pool tags to aggregate and analyze costs. These tags propagate in the following ways:
- Tags in DBU reports: Custom tags propagate to the billable usage system table logs. Custom and default tags propagate to DBU usage reports in the downloaded reports.
- GCE labels for each VM and its persistent disks: Tags propagate to labels on GCE resources such as VM and its persistent disks. This allows you to use GCE usage metering to attribute costs. The tag's keys and values are transformed to conform with GCE label format limits.
How tags propagate for clusters created from pools
Tags propagate to node instances differently depending on whether or not a cluster was created from a pool.
- If a cluster is not created from a pool, its tags propagate as expected to node instances.
- If a cluster is created from a pool, its instances inherit both the pool tags and the cluster tags. The pool's tags are used directly for VM usage data only for the idle VMs.
- If there is a tag name conflict, Databricks default tags take precedence over custom tags, and pool tags take precedence over cluster tags.
Tag enforcement
To enforce the use of specific custom tags, you can use compute policies. See Custom tag enforcement. To enforce custom tags on serverless compute workloads, use serverless budget policies.
Limitations
- Tag keys and values can only contain letters, spaces, numbers, or the characters
+
,-
,=
,.
,_
,:
,/
,@
. Tags containing other characters are invalid. - If you change tag keys names or values, these changes apply only after cluster restart or pool expansion.
- The maximum custom number of tags that can propagate to GCE labels is 54.
- The maximum length for GCE label keys and values is 63 characters.
- Label propagation can be delayed due to GCE API rate limits for the project. You can resolve this by increasing the GCE API rate limits for the Google Cloud project.
GCE label limits
For GCE labels, there are limitations:
- Keys and values must consist only of lowercase letters, numeric characters, underscores, and dashes.
- Maximum length for GCE label keys and values is 63 characters.
- Maximum number of tags that can propagate to GCE labels is 54.
To conform to GCE format rules, tags are transformed before they become GCE label keys and values. If there are duplicates after transformation, the pair of keys and values that appear later (lower) in tag definitions are the ones that persist.