Prometheus metrics and alerts

This is an overview of all the charms used in Charmed HPC that provide monitoring metrics and alerts for Prometheus, a metrics aggregator and alerts manager for applications.

All metrics and alerts can be viewed from Prometheus or from the Grafana web interface. See Integrate with Canonical Observability Stack for more information.

The following table lists all the charms on Charmed HPC that expose metrics and alerts to Prometheus, with their corresponding upstream documentation to know more about the metrics exported. The last column shows the corresponding query to list the exported metrics in Prometheus or the Grafana UI.

charm

upstream docs

query

slurmctld

Documentation

{juju_charm="slurmctld"}

mysql

Documentation

{juju_charm="mysql"}

postgresql-k8s

Documentation

{juju_charm="postgresql-k8s"}

glauth-k8s

Documentation

{juju_charm="glauth-k8s"}

traefik-k8s

Documentation

{juju_charm="traefik-k8s"}

Slurmctld

The slurmctld charm exposes metrics related to:

  • Resource usage per partition, account or user.

  • Jobs statuses.

  • RPC messages for slurmctld.

  • Prometheus Slurm Exporter statistics.