Grafana dashboards¶

This is an overview of all the charms used in Charmed HPC that provide dashboards for Grafana, which acts as a web interface to visualize data from aggregators such as Prometheus or Loki. See Grafana dashboards for more general information on dashboards, and Prometheus metrics and alerts for more information about the metrics displayed on the dashboards.

Note

Any panel can be inspected using the panel inspect view to see the exact query used to provide the panel with data.

Slurmctld¶

The dashboard from the slurmctld charm displays an overall view of the cluster, including the following information:

CPU and memory usage per partition.
Node state count.
CPU and memory usage per account.
Statistics on Slurmctld RPC messages.

Slurm partition dashboard Slurm account dashboard Slurm rpc dashboard

MySQL¶

The dashboard from the mysql charm displays metrics for the storage database of Slurmdbd:

Uptime.
Queries per second.
Current cache size.
Maximum number of concurrent connections.
Thread resource usage.
Network traffic statistics.

MySQL dashboard

Traefik K8s¶

The dashboard from the traefik-k8s charm displays metrics about the reverse proxy used when communicating between the compute plane cluster and the monitoring/identity k8s clusters. This includes:

Uptime.
HTTP response code statistics.
Response times.
Open connections statistics.
Raw logs for every proxied endpoint.

Traefik dashboard