Prometheus metrics and alerts¶

This is an overview of all the charms used in Charmed HPC that provide monitoring metrics and alerts for Prometheus, a metrics aggregator and alerts manager for applications.

All metrics and alerts can be viewed from Prometheus or from the Grafana web interface. See Integrate with Canonical Observability Stack for more information.

The following table lists all the charms on Charmed HPC that expose metrics and alerts to Prometheus, with their corresponding upstream documentation to know more about the metrics exported. The last column shows the corresponding query to list the exported metrics in Prometheus or the Grafana UI.

charm	upstream docs	query
slurmctld	Documentation	`{juju_charm="slurmctld"}`
mysql	Documentation	`{juju_charm="mysql"}`
postgresql-k8s	Documentation	`{juju_charm="postgresql-k8s"}`
glauth-k8s	Documentation	`{juju_charm="glauth-k8s"}`
traefik-k8s	Documentation	`{juju_charm="traefik-k8s"}`

Slurmctld¶

The slurmctld charm exposes metrics related to:

Resource usage per partition, account or user.
Jobs statuses.
RPC messages for slurmctld.
Prometheus Slurm Exporter statistics.