Prometheus metrics and alerts¶
This is an overview of all the charms used in Charmed HPC that provide monitoring metrics and alerts for Prometheus, a metrics aggregator and alerts manager for applications.
All metrics and alerts can be viewed from Prometheus or from the Grafana web interface. See Integrate with Canonical Observability Stack for more information.
The following table lists all the charms on Charmed HPC that expose metrics and alerts to Prometheus, with their corresponding upstream documentation to know more about the metrics exported. The last column shows the corresponding query to list the exported metrics in Prometheus or the Grafana UI.
charm |
upstream docs |
query |
---|---|---|
slurmctld |
|
|
mysql |
|
|
postgresql-k8s |
|
|
glauth-k8s |
|
|
traefik-k8s |
|
Slurmctld¶
The slurmctld
charm exposes metrics related to:
Resource usage per partition, account or user.
Jobs statuses.
RPC messages for
slurmctld
.Prometheus Slurm Exporter statistics.