Skip to main content

Use tags to attribute and track usage

This article explains how to use tags to attribute compute usage to specific workspaces, teams, projects, or users to support cost tracking and budgeting.

There are two types of tags:

  • Default tags: Automatically applied by Databricks to cloud-deployed resources. These provide basic metadata like vendor, cluster ID, and creator.
  • Custom tags: User-defined tags that you can add to compute resources and serverless workloads. These allow for granular tracking, reporting, and budgeting.
warning

Tag data may be replicated globally. Do not use tag names or values that could compromise the security of your resources. For example, do not use tag names that contain personal or sensitive information.

Default tags

Databricks automatically adds default tags to compute resources it deploys in your cloud account. These tags attribute the usage to Databricks and provide basic information about the resource, such as its name, ID, and creator.

Default tags automatically propagate to AWS EC2 and AWS EBS instances for cost analysis.

Default tag keys and values

Databricks adds the following default tags to compute resources:

Tag key

Value

Vendor

Constant value: Databricks

ClusterId

Databricks internal ID of the cluster

ClusterName

Name of the cluster

Creator

Username (email address) of the user who created the cluster

RunName

Job name (only propagates on jobs compute)

JobId

Job ID (only propagates on jobs compute)

Compute used by Lakehouse monitoring includes these additional tags:

Tag key

Value

LakehouseMonitoring

true

LakehouseMonitoringTableId

ID of the monitored table

LakehouseMonitoringWorkspaceId

ID of the workspace where the monitor was created

LakehouseMonitoringMetastoreId

ID of the metastore where the monitored table exists

Databricks adds the following default tags to pools and the compute resources created by pools.

Tag key

Value

Vendor

Constant value: Databricks

DatabricksInstancePoolCreatorId

Databricks internal ID of the user who created the pool

DatabricksInstancePoolId

Databricks internal ID of the pool

Custom tags

Custom tags let you attribute compute usage to specific teams, projects, or cost centers with more granularity than default tags. These tags are applied by users or admins and propagate to both your account's usage logs and applicable cloud resources. These tags are also used to create and monitor budgets in your Databricks account.

Supported resources for custom tags

You can add custom tags for the following objects managed by Databricks:

Object

Tagging interface (UI)

Tagging interface (API)

Workspace

N/A

Account API

Pool

Pools UI in the Databricks workspace

Instance Pool API

All-purpose and job compute

Compute UI in the Databricks workspace

Clusters API

SQL warehouse

SQL warehouse UI in the Databricks workspace

Warehouses API

Custom tags appear in lowercase in GCE logs. Characters are removed if they are not letters, numbers, underscores or dashes. For example, My Key becomes mykey and My.Val becomes myval.

warning

Do not assign a custom tag with the key Name to a cluster. Every cluster has a tag Name whose value is set by Databricks. If you change the value associated with the key Name, the cluster can no longer be tracked by Databricks. As a consequence, the cluster might not be terminated after becoming idle and will continue to incur usage costs.

Tag serverless compute workloads

Preview

This feature is in Public Preview.

To attribute serverless compute usage to users, groups, or projects, you can use serverless budget policies. When a user is assigned a serverless budget policy, their serverless usage is automatically tagged with their policy’s custom tags. Serverless budget policies can be applied to serverless notebooks, jobs, pipelines, and model serving endpoints.

note

Serverless compute usage is logged in your account's billable usage system table. The legacy DBU usage reports do not include serverless usage or serverless budget policy tags.

See Attribute usage with serverless budget policies.

Tag propagation

Tags are propagated to AWS EC2 instances differently depending on whether or not the cluster was created from a pool.

If a cluster is created from a pool, its EC2 instances inherit only the custom and default workspace tags and pool tags, not the cluster tags. Therefore if you want to create clusters from a pool, make sure to assign all of the custom cluster tags you need to the workspace or pool.

Cluster and pool tags both propagate to DBU usage reports, even if the cluster was created from a pool.

If there is a tag name conflict, Databricks default tags take precedence over custom tags and pool tags take precedence over cluster tags.

Tag enforcement

To enforce the use of specific custom tags, you can use compute policies. See Custom tag enforcement. To enforce custom tags on serverless compute workloads, use serverless budget policies.

To ensure that certain tags are always populated when compute resources are created across a workspace, you can apply a specific IAM policy to your workspace’s primary IAM role (the one created during workspace setup; contact your AWS administrator if you need access). The IAM policy should include explicit Deny statements for mandatory tag keys and optional values. Cluster creation will fail if required tags with one of the allowed values aren’t provided.

For example, if you want to enforce Department and Project tags, with only specified values allowed for the former and a free-form non-empty value for the latter, you could apply an IAM policy like this one:

JSON
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "MandateLaunchWithTag1",
"Effect": "Deny",
"Action": ["ec2:RunInstances", "ec2:CreateTags"],
"Resource": "arn:aws:ec2:region:accountId:instance/*",
"Condition": {
"StringNotEqualsIgnoreCase": {
"aws:RequestTag/Department": ["Deptt1", "Deptt2", "Deptt3"]
}
}
},
{
"Sid": "MandateLaunchWithTag2",
"Effect": "Deny",
"Action": ["ec2:RunInstances", "ec2:CreateTags"],
"Resource": "arn:aws:ec2:region:accountId:instance/*",
"Condition": {
"StringNotLike": {
"aws:RequestTag/Project": "?*"
}
}
}
]
}

Both ec2:RunInstances and ec2:CreateTags actions are required for each tag for effective coverage of scenarios in which there are clusters that have only on-demand instances, only spot instances, or both.

tip

Databricks recommends that you add a separate policy statement for each tag. The overall policy might become long, but it is easier to debug. See the IAM Policy Condition Operators Reference for a list of operators that can be used in a policy.

Cluster creation errors due to an IAM policy show an encoded error message, starting with:

Console
Cloud Provider Launch Failure: A cloud provider error was encountered while setting up the cluster.

The message is encoded because the details of the authorization status can constitute privileged information that the user who requested the action should not see. See DecodeAuthorizationMessage API (or CLI) for information about how to decode such messages.

Limitations

  • Tag keys and values can only contain letters, spaces, numbers, or the characters +, -, =, ., _, :, /, @. Tags containing other characters are invalid.
  • If you change tag keys names or values, these changes apply only after cluster restart or pool expansion.
  • If the cluster’s custom tags conflict with a pool’s custom tags, the cluster can’t be created.
  • It can take up to one hour for custom workspace tags to propagate after any change.
  • No more than 20 tags can be assigned to a workspace resource.