How to deploy Slurm¶
This how-to guide shows you how to deploy the Slurm workload manager as the resource management and job scheduling service of your Charmed HPC cluster. The deployment, management, and operations of Slurm are controlled by the Slurm charms.
Prerequisites¶
To successfully deploy Slurm in your Charmed HPC cluster, you will at least need:
The Juju CLI client installed on your machine.
Once you have verified that you have met the prerequisites above, proceed to the instructions below.
Deploy Slurm¶
You have two options for deploying Slurm:
Using the Juju CLI client.
Using the Juju Terraform client.
If you want to use Terraform to deploy Slurm, see the Install and manage the client (terraform juju) how-to in the Juju documentation for additional requirements.
If you are deploying Slurm on LXD, see Deploying Slurm on LXD for more information on additional constraints that must be passed to Juju.
To deploy Slurm using the Juju CLI client, first create the slurm
model that will hold the
deployment. The slurm
model is the abstraction that will hold the resources —
machines, integrations, network spaces, storage, etc. — that are provisioned as
part of your Slurm deployment.
Run the following command to create the slurm
model in your charmed-hpc
machine cloud:
juju add-model slurm charmed-hpc
Now, with slurm
model created, run the following set of commands to deploy the Slurm
daemons with MySQL as the storage back-end for slurmdbd
:
juju deploy sackd --base "[email protected]" --channel "edge"
juju deploy slurmctld --base "[email protected]" --channel "edge"
juju deploy slurmd --base "[email protected]" --channel "edge"
juju deploy slurmdbd --base "[email protected]" --channel "edge"
juju deploy slurmrestd --base "[email protected]" --channel "edge"
juju deploy mysql --channel "8.0/stable"
juju deploy
only deploys the Slurm charms. juju integrate
integrates
the charms together which will trigger the necessary events for the
Slurm daemons to reach active status. Run the following set of commands
to integrate the Slurm daemons together:
juju integrate slurmctld sackd
juju integrate slurmctld slurmd
juju integrate slurmctld slurmdbd
juju integrate slurmctld slurmrestd
juju integrate slurmdbd mysql:database
After a few minutes, your Slurm deployment will become active. The output of the
juju status
command should be similar to the following:
user@host:~$
juju status
Model Controller Cloud/Region Version SLA Timestamp
slurm charmed-hpc localhost/localhost 3.6.0 unsupported 17:16:37Z
App Version Status Scale Charm Channel Rev Exposed Message
mysql 8.0.39-0ubun... active 1 mysql 8.0/stable 313 no
sackd 23.11.4-1.2u... active 1 sackd latest/edge 4 no
slurmctld 23.11.4-1.2u... active 1 slurmctld latest/edge 86 no
slurmd 23.11.4-1.2u... active 1 slurmd latest/edge 107 no
slurmdbd 23.11.4-1.2u... active 1 slurmdbd latest/edge 78 no
slurmrestd 23.11.4-1.2u... active 1 slurmrestd latest/edge 80 no
Unit Workload Agent Machine Public address Ports Message
mysql/0* active idle 5 10.32.18.127 3306,33060/tcp Primary
sackd/0* active idle 4 10.32.18.203
slurmctld/0* active idle 0 10.32.18.15
slurmd/0* active idle 1 10.32.18.207
slurmdbd/0* active idle 2 10.32.18.102
slurmrestd/0* active idle 3 10.32.18.9
Machine State Address Inst id Base AZ Message
0 started 10.32.18.15 juju-d566c2-0 ubuntu@24.04 Running
1 started 10.32.18.207 juju-d566c2-1 ubuntu@24.04 Running
2 started 10.32.18.102 juju-d566c2-2 ubuntu@24.04 Running
3 started 10.32.18.9 juju-d566c2-3 ubuntu@24.04 Running
4 started 10.32.18.203 juju-d566c2-4 ubuntu@24.04 Running
5 started 10.32.18.127 juju-d566c2-5 ubuntu@22.04 Running
To deploy Slurm using the Juju Terraform client, first configure Terraform to use the Juju provider in your deployment plan.
main.tf
¶terraform {
required_providers {
juju = {
source = "juju/juju"
version = ">= 0.16.0"
}
}
}
Now create the slurm
model that will hold the deployment. The slurm
model is the
abstraction that will hold the resources — machines, integrations, network spaces,
storage, etc. — that are provisioned as part of your Slurm deployment. This
resource will direct Juju to create the model slurm
:
main.tf
¶resource "juju_model" "slurm" {
name = "slurm"
cloud {
name = "charmed-hpc"
}
}
With the slurm
juju_model
resource defined, declare the following set of modules
in your Terraform plan. These modules will direct Juju to deploy the Slurm daemons with
MySQL as the storage back-end for slurmdbd
:
main.tf
¶module "sackd" {
source = "git::https://github.com/charmed-hpc/slurm-charms//charms/sackd/terraform"
model_name = juju_model.slurm.name
}
module "slurmctld" {
source = "git::https://github.com/charmed-hpc/slurm-charms//charms/slurmctld/terraform"
model_name = juju_model.slurm.name
}
module "slurmd" {
source = "git::https://github.com/charmed-hpc/slurm-charms//charms/slurmd/terraform"
model_name = juju_model.slurm.name
}
module "slurmdbd" {
source = "git::https://github.com/charmed-hpc/slurm-charms//charms/slurmdbd/terraform"
model_name = juju_model.slurm.name
}
module "slurmrestd" {
source = "git::https://github.com/charmed-hpc/slurm-charms//charms/slurmrestd/terraform"
model_name = juju_model.slurm.name
}
module "mysql" {
source = "git::https://github.com/canonical/mysql-operator//terraform"
juju_model_name = juju_model.slurm.name
}
Declaring the modules only deploys the Slurm charms. Integrations are still required to trigger the necessary events for the Slurm daemons to reach active status. Declare the following set of resources in your deployment plan. These resources will direct Juju to integrate the Slurm daemons together:
main.tf
¶resource "juju_integration" "sackd-to-slurmctld" {
model = juju_model.slurm.name
application {
name = module.sackd.app_name
endpoint = module.sackd.provides.slurmctld
}
application {
name = module.slurmctld.app_name
endpoint = module.slurmctld.requires.login-node
}
}
resource "juju_integration" "slurmd-to-slurmctld" {
model = juju_model.slurm.name
application {
name = module.slurmd.app_name
endpoint = module.slurmd.provides.slurmctld
}
application {
name = module.slurmctld.app_name
endpoint = module.slurmctld.requires.slurmd
}
}
resource "juju_integration" "slurmdbd-to-slurmctld" {
model = juju_model.slurm.name
application {
name = module.slurmdbd.app_name
endpoint = module.slurmdbd.provides.slurmctld
}
application {
name = module.slurmctld.app_name
endpoint = module.slurmctld.requires.slurmdbd
}
}
resource "juju_integration" "slurmrestd-to-slurmctld" {
model = juju_model.slurm.name
application {
name = module.slurmrestd.app_name
endpoint = module.slurmrestd.provides.slurmctld
}
application {
name = module.slurmctld.app_name
endpoint = module.slurmctld.requires.slurmrestd
}
}
resource "juju_integration" "slurmdbd-to-mysql" {
model = juju_model.slurm.name
application {
name = module.mysql.application_name
endpoint = module.mysql.provides.database
}
application {
name = module.slurmdbd.app_name
endpoint = module.slurmdbd.requires.database
}
}
With all the charm modules, juju_model
, and juju_integration
resources
declared in your deployment plan, you are now ready time to deploy Slurm.
Expand the dropdown below to see the full deployment plan:
Full Slurm deployment plan
main.tf
¶ 1terraform {
2 required_providers {
3 juju = {
4 source = "juju/juju"
5 version = ">= 0.16.0"
6 }
7 }
8}
9
10resource "juju_model" "slurm" {
11 name = "slurm"
12
13 cloud {
14 name = "charmed-hpc"
15 }
16}
17
18module "sackd" {
19 source = "git::https://github.com/charmed-hpc/slurm-charms//charms/sackd/terraform"
20 model_name = juju_model.slurm.name
21}
22
23module "slurmctld" {
24 source = "git::https://github.com/charmed-hpc/slurm-charms//charms/slurmctld/terraform"
25 model_name = juju_model.slurm.name
26}
27
28module "slurmd" {
29 source = "git::https://github.com/charmed-hpc/slurm-charms//charms/slurmd/terraform"
30 model_name = juju_model.slurm.name
31}
32
33module "slurmdbd" {
34 source = "git::https://github.com/charmed-hpc/slurm-charms//charms/slurmdbd/terraform"
35 model_name = juju_model.slurm.name
36}
37
38module "slurmrestd" {
39 source = "git::https://github.com/charmed-hpc/slurm-charms//charms/slurmrestd/terraform"
40 model_name = juju_model.slurm.name
41}
42
43module "mysql" {
44 source = "git::https://github.com/canonical/mysql-operator//terraform"
45 juju_model_name = juju_model.slurm.name
46}
47
48resource "juju_integration" "sackd-to-slurmctld" {
49 model = juju_model.slurm.name
50
51 application {
52 name = module.sackd.app_name
53 endpoint = module.sackd.provides.slurmctld
54 }
55
56 application {
57 name = module.slurmctld.app_name
58 endpoint = module.slurmctld.requires.login-node
59 }
60}
61
62resource "juju_integration" "slurmd-to-slurmctld" {
63 model = juju_model.slurm.name
64
65 application {
66 name = module.slurmd.app_name
67 endpoint = module.slurmd.provides.slurmctld
68 }
69
70 application {
71 name = module.slurmctld.app_name
72 endpoint = module.slurmctld.requires.slurmd
73 }
74}
75
76resource "juju_integration" "slurmdbd-to-slurmctld" {
77 model = juju_model.slurm.name
78
79 application {
80 name = module.slurmdbd.app_name
81 endpoint = module.slurmdbd.provides.slurmctld
82 }
83
84 application {
85 name = module.slurmctld.app_name
86 endpoint = module.slurmctld.requires.slurmdbd
87 }
88}
89
90resource "juju_integration" "slurmrestd-to-slurmctld" {
91 model = juju_model.slurm.name
92
93 application {
94 name = module.slurmrestd.app_name
95 endpoint = module.slurmrestd.provides.slurmctld
96 }
97
98 application {
99 name = module.slurmctld.app_name
100 endpoint = module.slurmctld.requires.slurmrestd
101 }
102}
103
104resource "juju_integration" "slurmdbd-to-mysql" {
105 model = juju_model.slurm.name
106
107 application {
108 name = module.mysql.application_name
109 endpoint = module.mysql.provides.database
110 }
111
112 application {
113 name = module.slurmdbd.app_name
114 endpoint = module.slurmdbd.requires.database
115 }
116}
After verifying that your plan is correct, run the following set of commands to deploy Slurm using Terraform and the Juju provider:
terraform init
terraform apply -auto-approve
Tip
You can run terraform validate
to validate your Slurm deployment plan before applying it.
You can also run terraform plan
to see the speculative execution plan that Terraform
will follow to deploy the Slurm charms, however, note that terraform plan
will not
actually execute plan.
After a few minutes, your Slurm deployment will become active. The output of the
juju status
command should be similar to the following:
user@host:~$
juju status
Model Controller Cloud/Region Version SLA Timestamp
slurm charmed-hpc localhost/localhost 3.6.0 unsupported 17:16:37Z
App Version Status Scale Charm Channel Rev Exposed Message
mysql 8.0.39-0ubun... active 1 mysql 8.0/stable 313 no
sackd 23.11.4-1.2u... active 1 sackd latest/edge 4 no
slurmctld 23.11.4-1.2u... active 1 slurmctld latest/edge 86 no
slurmd 23.11.4-1.2u... active 1 slurmd latest/edge 107 no
slurmdbd 23.11.4-1.2u... active 1 slurmdbd latest/edge 78 no
slurmrestd 23.11.4-1.2u... active 1 slurmrestd latest/edge 80 no
Unit Workload Agent Machine Public address Ports Message
mysql/0* active idle 5 10.32.18.127 3306,33060/tcp Primary
sackd/0* active idle 4 10.32.18.203
slurmctld/0* active idle 0 10.32.18.15
slurmd/0* active idle 1 10.32.18.207
slurmdbd/0* active idle 2 10.32.18.102
slurmrestd/0* active idle 3 10.32.18.9
Machine State Address Inst id Base AZ Message
0 started 10.32.18.15 juju-d566c2-0 ubuntu@24.04 Running
1 started 10.32.18.207 juju-d566c2-1 ubuntu@24.04 Running
2 started 10.32.18.102 juju-d566c2-2 ubuntu@24.04 Running
3 started 10.32.18.9 juju-d566c2-3 ubuntu@24.04 Running
4 started 10.32.18.203 juju-d566c2-4 ubuntu@24.04 Running
5 started 10.32.18.127 juju-d566c2-5 ubuntu@22.04 Running
Deploying Slurm on LXD¶
The Slurm charms can deploy, manage, and operate Slurm on any
supported machine cloud, however, each
cloud has their own permutations. On LXD, if you deploy the charms to system containers rather
than virtual machines, Slurm cannot use the recommended process tracking plugin proctrack/cgroup
,
and additional modifications must be made to the default LXD profile.
To deploy the Slurm charms to virtual machines rather than system containers, pass the constraint
"virt-type=virtual-machine"
to Juju when deploying the charms:
juju deploy sackd --base "[email protected]" --channel "edge" --constraints="virt-type=virtual-machine"
juju deploy slurmctld --base "[email protected]" --channel "edge" --constraints="virt-type=virtual-machine"
juju deploy slurmd --base "[email protected]" --channel "edge" --constraints="virt-type=virtual-machine"
juju deploy slurmdbd --base "[email protected]" --channel "edge" --constraints="virt-type=virtual-machine"
juju deploy slurmrestd --base "[email protected]" --channel "edge" --constraints="virt-type=virtual-machine"
juju deploy mysql --channel "8.0/stable" --constraints="virt-type=virtual-machine"
main.tf
¶module "sackd" {
source = "git::https://github.com/charmed-hpc/slurm-charms//charms/sackd/terraform"
model_name = juju_model.slurm.name
constraints = "arch=amd64 virt-type=virtual-machine"
}
module "slurmctld" {
source = "git::https://github.com/charmed-hpc/slurm-charms//charms/slurmctld/terraform"
model_name = juju_model.slurm.name
constraints = "arch=amd64 virt-type=virtual-machine"
}
module "slurmd" {
source = "git::https://github.com/charmed-hpc/slurm-charms//charms/slurmd/terraform"
model_name = juju_model.slurm.name
constraints = "arch=amd64 virt-type=virtual-machine"
}
module "slurmdbd" {
source = "git::https://github.com/charmed-hpc/slurm-charms//charms/slurmdbd/terraform"
model_name = juju_model.slurm.name
constraints = "arch=amd64 virt-type=virtual-machine"
}
module "slurmrestd" {
source = "git::https://github.com/charmed-hpc/slurm-charms//charms/slurmrestd/terraform"
model_name = juju_model.slurm.name
constraints = "arch=amd64 virt-type=virtual-machine"
}
module "mysql" {
source = "git::https://github.com/canonical/mysql-operator//terraform"
juju_model_name = juju_model.slurm.name
constraints = "arch=amd64 virt-type=virtual-machine"
}
Set compute nodes to IDLE
¶
Compute nodes are initially enlisted with their state set to DOWN
after your Slurm deployment
becomes active. To set the compute nodes’ state to IDLE
so that they can start having jobs
scheduled on them, use juju run
to run the resume
action on the leading controller:
juju run slurmctld/leader resume nodename="<machine-instance-id/hostname>"
Tips
You can get the hostname of all your compute nodes with
juju exec
:
juju exec --application slurmd -- hostname -s
The
nodename
parameter of theresume
action also accepts node ranges for setting the state of compute nodes toIDLE
in bulk:
juju run slurmctld/leader resume nodename="<machine-instance-id/hostname>[range]"
Verify compute nodes are IDLE
¶
The sackd charm installs the Slurm client commands. To use sinfo
to verify that a compute
node’s state is IDLE
, run the following command with juju exec
in your sackd unit:
juju exec -u sackd/0 -- sinfo --nodes $(juju exec -u slurmd/0 -- hostname)
To verify that the entire partition is IDLE
, run sinfo
without the
--nodes
flag:
juju exec -u sackd/0 -- sinfo