Metrics help you monitor the performance of Databricks clusters.
Ganglia metrics are always available on Databricks. You can also install Datadog agents on cluster nodes to send Datadog metrics to your Datadog account. This topic describes how to access and configure Ganglia metrics and includes notebooks that install Datadog agents on Databricks clusters.
To access the Ganglia UI, navigate to the Metrics tab on the cluster details page. GPU metrics are available in the Ganglia UI for GPU-enabled clusters running Databricks Runtime 4.1 and above.
To view live metrics, click the Ganglia UI link.
To view historical metrics, click a snapshot file. The snapshot contains aggregated metrics for the hour preceding the selected time.
You can install Datadog agents on cluster nodes to send Datadog metrics to your Datadog account. The following notebook demonstrates how to install a Datadog agent on a cluster using a cluster-scoped init script.
To install the Datadog agent on all clusters, use a global init script after testing the cluster-scoped init script.