Getting started with Charmed HPC

This tutorial takes you through multiple aspects of Charmed HPC, such as:

  • Building a small Charmed HPC cluster with a shared filesystem

  • Preparing and submitting a multi-node batch job to your Charmed HPC cluster’s workload scheduler

  • Creating and using a container image to provide the runtime environment for a submitted batch job

By the end of this tutorial, you will have worked with a variety of open source projects, such as:

  • Multipass

  • Juju

  • Charms

  • Apptainer

  • Ceph

  • Slurm

This tutorial assumes that you have had some exposure to high-performance computing concepts such as batch scheduling, but does not assume prior experience building HPC clusters. This tutorial also does not expect you to have any prior experience with Multipass, Juju, Apptainer, Ceph, or Slurm.

Using Charmed HPC in production

The Charmed HPC cluster built in this tutorial is for learning purposes and should not be used as the basis for a production HPC cluster. For more in-depth steps on how to deploy a fully operational Charmed HPC cluster, see Charmed HPC’s How-to guides

Prerequisites

To successfully complete this tutorial, you will need:

  • At least 8 CPU cores, 16GB RAM, and 40GB storage available

  • Multipass installed

  • An active internet connection

Create a virtual machine with Multipass

First, download a copy of the cloud initialization (cloud-init) file, charmed-hpc-tutorial-cloud-init.yml, that defines the underlying cloud infrastructure for the virtual machine. For this tutorial, the file includes instructions for creating and configuring your LXD machine cloud localhost with the charmed-hpc-controller Juju controller and creating workload and submit scripts for the example jobs. The cloud-init step will be completed as part of the virtual machine launch and will not be something you need to set up manually. You can expand the dropdown below to view the full cloud-init file before downloading onto your local system:

charmed-hpc-tutorial-cloud-init.yml
  1#cloud-config
  2
  3# Ensure VM is fully up-to-date multipass does not support reboots.
  4# See: https://github.com/canonical/multipass/issues/4199
  5# Package management
  6package_reboot_if_required: false
  7package_update: true
  8package_upgrade: true
  9
 10# Install prerequisites
 11snap:
 12  commands:
 13    00: snap install juju --channel=3/stable
 14    01: snap install lxd --channel=6/stable
 15
 16# Configure and initialize prerequisites
 17lxd:
 18  init:
 19    storage_backend: dir
 20
 21# Commands to run at the end of the cloud-init process
 22runcmd:
 23  - lxc network set lxdbr0 ipv6.address none
 24  - su ubuntu -c 'juju bootstrap localhost charmed-hpc-controller'
 25
 26# Write files to the Multipass instance
 27write_files:
 28  # MPI workload dependencies
 29  - path: /home/ubuntu/mpi_hello_world.c
 30    owner: ubuntu:ubuntu
 31    permissions: !!str "0664"
 32    defer: true
 33    content: |
 34      #include <mpi.h>
 35      #include <stdio.h>
 36      
 37      int main(int argc, char** argv) {
 38          // Initialize the MPI environment
 39          MPI_Init(NULL, NULL);
 40
 41          // Get the number of nodes
 42          int size;
 43          MPI_Comm_size(MPI_COMM_WORLD, &size);
 44
 45          // Get the rank of the process
 46          int rank;
 47          MPI_Comm_rank(MPI_COMM_WORLD, &rank);
 48
 49          // Get the name of the node
 50          char node_name[MPI_MAX_PROCESSOR_NAME];
 51          int name_len;
 52          MPI_Get_processor_name(node_name, &name_len);
 53
 54          // Print hello world message
 55          printf("Hello world from node %s, rank %d out of %d nodes\n",
 56                 node_name, rank, size);
 57
 58          // Finalize the MPI environment.
 59          MPI_Finalize();
 60      }
 61  - path: /home/ubuntu/submit_hello.sh
 62    owner: ubuntu:ubuntu
 63    permissions: !!str "0664"
 64    defer: true
 65    content: |
 66      #!/usr/bin/env bash
 67      #SBATCH --job-name=hello_world
 68      #SBATCH --partition=tutorial-partition
 69      #SBATCH --nodes=2
 70      #SBATCH --error=error.txt
 71      #SBATCH --output=output.txt
 72
 73      mpirun ./mpi_hello_world
 74  # Container workload dependencies.
 75  - path: /home/ubuntu/generate.py
 76    owner: ubuntu:ubuntu
 77    permissions: !!str "0664"
 78    defer: true
 79    content: |
 80      #!/usr/bin/env python3
 81
 82      """Generate example dataset for workload."""
 83
 84      import argparse
 85
 86      from faker import Faker
 87      from faker.providers import DynamicProvider
 88      from pandas import DataFrame
 89
 90
 91      faker = Faker()
 92      favorite_lts_mascot = DynamicProvider(
 93          provider_name="favorite_lts_mascot",
 94          elements=[
 95              "Dapper Drake",
 96              "Hardy Heron",
 97              "Lucid Lynx",
 98              "Precise Pangolin",
 99              "Trusty Tahr",
100              "Xenial Xerus",
101              "Bionic Beaver",
102              "Focal Fossa",
103              "Jammy Jellyfish",
104              "Noble Numbat",
105          ],
106      )
107      faker.add_provider(favorite_lts_mascot)
108
109
110      def main(rows: int) -> None:
111          df = DataFrame(
112              [
113                  [faker.email(), faker.country(), faker.favorite_lts_mascot()]
114                  for _ in range(rows)
115              ],
116              columns=["email", "country", "favorite_lts_mascot"],
117          )
118          df.to_csv("favorite_lts_mascot.csv")
119
120
121      if __name__ == "__main__":
122          parser = argparse.ArgumentParser()
123          parser.add_argument(
124              "--rows", type=int, default=1, help="Rows of fake data to generate"
125          )
126          args = parser.parse_args()
127
128          main(rows=args.rows)
129  - path: /home/ubuntu/workload.py
130    owner: ubuntu:ubuntu
131    permissions: !!str "0664"
132    defer: true
133    content: |
134      #!/usr/bin/env python3
135      
136      """Plot the most popular Ubuntu LTS mascot."""
137      
138      import argparse
139      import os
140      
141      import pandas as pd
142      import plotext as plt
143      
144      def main(dataset: str | os.PathLike, file: str | os.PathLike) -> None:
145          df = pd.read_csv(dataset)
146          mascots = df["favorite_lts_mascot"].value_counts().sort_index()
147      
148          plt.simple_bar(
149              mascots.index,
150              mascots.values,
151              title="Favorite LTS mascot",
152              color="orange",
153              width=150,
154          )
155      
156          if file:
157              plt.save_fig(
158                  file if os.path.isabs(file) else f"{os.getcwd()}/{file}",
159                  keep_colors=True
160              )
161          else:
162              plt.show()
163      
164      if __name__ == "__main__":
165          parser = argparse.ArgumentParser()
166          parser.add_argument("dataset", type=str, help="Path to CSV dataset to plot")
167          parser.add_argument(
168              "-o",
169              "--output",
170              type=str,
171              default="",
172              help="Output file to save plotted graph",
173          )
174          args = parser.parse_args()
175      
176          main(args.dataset, args.output)
177  - path: /home/ubuntu/workload.def
178    owner: ubuntu:ubuntu
179    permissions: !!str "0664"
180    defer: true
181    content: |
182      bootstrap: docker
183      from: ubuntu:24.04
184
185      %files
186          generate.py /usr/bin/generate
187          workload.py /usr/bin/workload
188
189      %environment
190          export PATH=/usr/bin/venv/bin:${PATH}
191          export PYTHONPATH=/usr/bin/venv:${PYTHONPATH}
192
193      %post
194          export DEBIAN_FRONTEND=noninteractive
195          apt-get update -y
196          apt-get install -y python3-dev python3-venv
197
198          python3 -m venv /usr/bin/venv
199          alias python3=/usr/bin/venv/bin/python3
200          alias pip=/usr/bin/venv/bin/pip
201
202          pip install -U faker
203          pip install -U pandas
204          pip install -U plotext
205
206          chmod 755 /usr/bin/generate
207          chmod 755 /usr/bin/workload
208
209      %runscript
210          exec workload "$@"
211  - path: /home/ubuntu/submit_apptainer_mascot.sh
212    owner: ubuntu:ubuntu
213    permissions: !!str "0664"
214    defer: true
215    content: |
216      #!/usr/bin/env bash
217      #SBATCH --job-name=favorite-lts-mascot
218      #SBATCH --partition=tutorial-partition
219      #SBATCH --nodes=2
220      #SBATCH --error=mascot_error.txt
221      #SBATCH --output=mascot_output.txt
222
223      apptainer exec workload.sif generate --rows 1000000
224      apptainer run workload.sif favorite_lts_mascot.csv --output graph.out

From the local directory holding the cloud-init file, launch a virtual machine using Multipass:

ubuntu@local:~$
multipass launch 24.04 --name charmed-hpc-tutorial --cloud-init charmed-hpc-tutorial-cloud-init.yml --memory 16G --disk 40G --cpus 8 --timeout 1000

The virtual machine launch process should take five minutes or less to complete, but may take longer due to network strength. Upon completion of the launch process, check the status of cloud-init to confirm that all processes completed successfully.

Enter the virtual machine:

ubuntu@local:~$
multipass shell charmed-hpc-tutorial

Then check cloud-init status:

ubuntu@charmed-hpc-tutorial:~$
cloud-init status --long
status: done
extended_status: done
boot_status_code: enabled-by-genertor
last_update: Thu, 01 Jan 1970 00:03:45 +0000
detail: DataSourceNoCloud [seed=/dev/sr0]
errors: []
recoverable_errors: {}

If the status shows done and there are no errors, then you are ready to move on to deploying the cluster charms.

Deploy Slurm and shared filesystem

Next, you will deploy Slurm and the filesystem. The Slurm components of your deployment will be composed of:

  • The Slurm management daemon: slurmctld

  • Two Slurm compute daemons: slurmd, grouped in a partition named tutorial-partition

  • The authentication and credential kiosk daemon: sackd to provide the login node

First, create the slurm model on your cloud localhost:

ubuntu@charmed-hpc-tutorial:~$
juju add-model slurm localhost

Then deploy the Slurm components:

ubuntu@charmed-hpc-tutorial:~$
juju deploy slurmctld --base "ubuntu@24.04" --channel "edge" --constraints="virt-type=virtual-machine"
ubuntu@charmed-hpc-tutorial:~$
juju deploy slurmd tutorial-partition -n 2 --base "ubuntu@24.04" --channel "edge" --constraints="virt-type=virtual-machine"
ubuntu@charmed-hpc-tutorial:~$
juju deploy sackd --base "ubuntu@24.04" --channel "edge" --constraints="virt-type=virtual-machine"

And integrate them together:

ubuntu@charmed-hpc-tutorial:~$
juju integrate slurmctld sackd
ubuntu@charmed-hpc-tutorial:~$
juju integrate slurmctld tutorial-partition

Next, you will deploy the filesystem pieces, which are:

  • the distributed storage system: microceph

  • ceph-fs to expose the MicroCeph cluster as a shared filesystem using CephFS

  • filesystem-client to mount the filesystem, named scratch

ubuntu@charmed-hpc-tutorial:~$
juju deploy microceph --channel latest/edge --constraints="virt-type=virtual-machine mem=4G root-disk=20G"
ubuntu@charmed-hpc-tutorial:~$
juju deploy ceph-fs --channel latest/edge
ubuntu@charmed-hpc-tutorial:~$
juju deploy filesystem-client scratch --channel latest/edge --config mountpoint=/scratch
ubuntu@charmed-hpc-tutorial:~$
juju add-storage microceph/0 osd-standalone=loop,2G,3

And then integrate the filesystem components together:

ubuntu@charmed-hpc-tutorial:~$
juju integrate scratch ceph-fs
ubuntu@charmed-hpc-tutorial:~$
juju integrate ceph-fs microceph
ubuntu@charmed-hpc-tutorial:~$
juju integrate scratch tutorial-partition
ubuntu@charmed-hpc-tutorial:~$
juju integrate sackd scratch

After a few minutes, the Slurm deployment will become active. The output of the juju status command should be similar to the following:

ubuntu@charmed-hpc-tutorial:~$
juju status
Model  Controller              Cloud/Region         Version  SLA          Timestamp
slurm  charmed-hpc-controller  localhost/localhost  3.6.9    unsupported  10:53:50-04:00

App                 Version          Status  Scale  Charm              Channel      Rev  Exposed  Message
ceph-fs             19.2.1           active      1  ceph-fs            latest/edge  196  no       Unit is ready
scratch                              active      3  filesystem-client  latest/edge   20  no       Integrated with `cephfs` provider
microceph                            active      1  microceph          latest/edge  159  no       (workload) charm is ready
sackd               23.11.4-1.2u...  active      1  sackd              latest/edge   38  no
slurmctld           23.11.4-1.2u...  active      1  slurmctld          latest/edge  120  no       primary - UP
tutorial-partition  23.11.4-1.2u...  active      2  slurmd             latest/edge  141  no

Unit                   Workload  Agent  Machine  Public address  Ports          Message
ceph-fs/0*             active    idle   5        10.248.240.129                 Unit is ready
microceph/0*           active    idle   4        10.248.240.102                 (workload) charm is ready
sackd/0*               active    idle   3        10.248.240.49   6818/tcp
  scratch/0*           active    idle            10.248.240.49                  Mounted filesystem at `/scratch`
slurmctld/0*           active    idle   0        10.248.240.162  6817,9092/tcp  primary - UP
tutorial-partition/0   active    idle   1        10.248.240.218  6818/tcp
  scratch/2            active    idle            10.248.240.218                 Mounted filesystem at `/scratch`
tutorial-partition/1*  active    idle   2        10.248.240.130  6818/tcp
  scratch/1            active    idle            10.248.240.130                 Mounted filesystem at `/scratch`

Machine  State    Address         Inst id        Base          AZ                       Message
0        started  10.248.240.162  juju-2586ad-0  [email protected]  charmed-hpc-tutorial  Running
1        started  10.248.240.218  juju-2586ad-1  [email protected]  charmed-hpc-tutorial  Running
2        started  10.248.240.130  juju-2586ad-2  [email protected]  charmed-hpc-tutorial  Running
3        started  10.248.240.49   juju-2586ad-3  [email protected]  charmed-hpc-tutorial  Running
4        started  10.248.240.102  juju-2586ad-4  [email protected]  charmed-hpc-tutorial  Running
5        started  10.248.240.129  juju-2586ad-5  [email protected]  charmed-hpc-tutorial  Running

Get compute nodes ready for jobs

Now that Slurm and the filesystem have been successfully deployed, the next step is to set up the compute nodes themselves. The compute nodes must be moved from the down state to the idle state so that they can start having jobs ran on them. First, check that the compute nodes are still down, which will show something similar to:

user@host:~$
juju exec -u sackd/0 -- sinfo
PARTITION         AVAIL  TIMELIMIT  NODES  STATE NODELIST
tutorial-partition    up   infinite      2   down juju-e16200-[1-2]

Then, bring up the compute nodes:

ubuntu@charmed-hpc-tutorial:~$
juju run tutorial-partition/0 node-configured
ubuntu@charmed-hpc-tutorial:~$
juju run tutorial-partition/1 node-configured

And verify that the STATE is now set to idle, which should now show:

ubuntu@charmed-hpc-tutorial:~$
juju exec -u sackd/0 -- sinfo
PARTITION         AVAIL  TIMELIMIT  NODES  STATE NODELIST
tutorial-parition    up   infinite      2   idle juju-e16200-[1-2]

Copy files onto cluster

The workload files that were created during the cloud initialization step now need to be copied onto the cluster filesystem from the virtual machine filesystem. First you will make the new example directories, then set appropriate permissions, and finally copy the files over:

ubuntu@charmed-hpc-tutorial:~$
juju exec -u sackd/0 -- sudo mkdir /scratch/mpi_example /scratch/apptainer_example
ubuntu@charmed-hpc-tutorial:~$
juju exec -u sackd/0 -- sudo chown $USER: /scratch/*
ubuntu@charmed-hpc-tutorial:~$
juju scp submit_hello.sh mpi_hello_world.c sackd/0:/scratch/mpi_example
ubuntu@charmed-hpc-tutorial:~$
juju scp submit_apptainer_mascot.sh generate.py workload.py workload.def sackd/0:/scratch/apptainer_example

The /scratch directory is mounted on the compute nodes and will be used to read and write from during the batch jobs.

Run a batch job

In the following steps, you will compile a small Hello World MPI script and run it by submitting a batch job to Slurm.

Compile

First, SSH into the login node, sackd/0:

ubuntu@charmed-hpc-tutorial:~$
juju ssh sackd/0

This will place you in your home directory /home/ubuntu. Next, you will need to move to the /scratch/mpi_example directory, install the Open MPI libraries need for compiling, and then compile the mpi_hello_world.c file by running the mpicc command:

ubuntu@login:~$
cd /scratch/mpi_example
ubuntu@login:~$
sudo apt install build-essential openmpi-bin libopenmpi-dev
ubuntu@login:~$
mpicc -o mpi_hello_world mpi_hello_world.c

For quick referencing, the two files for the MPI Hello World example are provided in dropdowns here:

mpi_hello_world.c
#include <mpi.h>
#include <stdio.h>
      
int main(int argc, char** argv) {
  // Initialize the MPI environment
  MPI_Init(NULL, NULL);

  // Get the number of nodes
  int size;
  MPI_Comm_size(MPI_COMM_WORLD, &size);

  // Get the rank of the process
  int rank;
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);

  // Get the name of the node
  char node_name[MPI_MAX_PROCESSOR_NAME];
  int name_len;
  MPI_Get_processor_name(node_name, &name_len);

  // Print hello world message
  printf("Hello world from node %s, rank %d out of %d nodes\n",
      node_name, rank, size);

  // Finalize the MPI environment.
  MPI_Finalize();
}
submit_hello.sh
#!/usr/bin/env bash
#SBATCH --job-name=hello_world
#SBATCH --partition=tutorial-partition
#SBATCH --nodes=2
#SBATCH --error=error.txt
#SBATCH --output=output.txt

mpirun ./mpi_hello_world

Submit batch job

Now, submit your batch job to the queue using sbatch:

ubuntu@login:~$
sbatch submit_hello.sh

You job will complete after a few seconds. The generated output.txt file will look similar to the following:

ubuntu@login:~$
cat output.txt
Hello world from processor juju-640476-1, rank 0 out of 2 processors
Hello world from processor juju-640476-2, rank 1 out of 2 processors

The batch job successfully spread the MPI job across two nodes that were able to report back their MPI rank to a shared output file.

Run a container job

Next you will go through the steps to generate a random sample of Ubuntu mascot votes and plot the results. The process requires Python and few specific libraries so you will use Apptainer to build a container job and run the job on the cluster.

Set up Apptainer

Apptainer must be deployed and integrated with the existing Slurm deployment using Juju and these steps need to be completed from charmed-hpc-tutorial environment; to return to that environment from within sackd/0, use the exit command.

Deploy and integrate Apptainer:

ubuntu@charmed-hpc-tutorial:~$
juju deploy apptainer
ubuntu@charmed-hpc-tutorial:~$
juju integrate apptainer tutorial-partition
ubuntu@charmed-hpc-tutorial:~$
juju integrate apptainer sackd
ubuntu@charmed-hpc-tutorial:~$
juju integrate apptainer slurmctld

After a few minutes, juju status should look similar to the following:

ubuntu@charmed-hpc-tutorial:~$
juju status
Model  Controller              Cloud/Region         Version  SLA          Timestamp
slurm  charmed-hpc-controller  localhost/localhost  3.6.9    unsupported  17:34:46-04:00

App                 Version          Status  Scale  Charm              Channel        Rev  Exposed  Message
apptainer           1.4.2            active      3  apptainer          latest/stable    6  no       
ceph-fs             19.2.1           active      1  ceph-fs            latest/edge    196  no       Unit is ready
scratch                              active      3  filesystem-client  latest/edge     20  no       Integrated with `cephfs` provider
microceph                            active      1  microceph          latest/edge    161  no       (workload) charm is ready
sackd               23.11.4-1.2u...  active      1  sackd              latest/edge     38  no       
slurmctld           23.11.4-1.2u...  active      1  slurmctld          latest/edge    120  no       primary - UP
tutorial-partition  23.11.4-1.2u...  active      2  slurmd             latest/edge    141  no       

Unit                   Workload  Agent  Machine  Public address  Ports          Message
ceph-fs/0*             active    idle   5        10.196.78.232                  Unit is ready
microceph/1*           active    idle   6        10.196.78.238                  (workload) charm is ready
sackd/0*               active    idle   3        10.196.78.117   6818/tcp       
  apptainer/2          active    idle            10.196.78.117                  
  scratch/2            active    idle            10.196.78.117                  Mounted filesystem at `/scratch`
slurmctld/0*           active    idle   0        10.196.78.49    6817,9092/tcp  primary - UP
tutorial-partition/0   active    idle   1        10.196.78.244   6818/tcp       
  apptainer/0          active    idle            10.196.78.244                  
  scratch/0*           active    idle            10.196.78.244                  Mounted filesystem at `/scratch`
tutorial-partition/1*  active    idle   2        10.196.78.26    6818/tcp       
  apptainer/1*         active    idle            10.196.78.26                   
  scratch/1            active    idle            10.196.78.26                   Mounted filesystem at `/scratch`

Machine  State    Address        Inst id        Base          AZ                       Message
0        started  10.196.78.49   juju-808105-0  [email protected]  charmed-hpc-tutorial  Running
1        started  10.196.78.244  juju-808105-1  [email protected]  charmed-hpc-tutorial  Running
2        started  10.196.78.26   juju-808105-2  [email protected]  charmed-hpc-tutorial  Running
3        started  10.196.78.117  juju-808105-3  [email protected]  charmed-hpc-tutorial  Running
5        started  10.196.78.232  juju-808105-5  [email protected]  charmed-hpc-tutorial  Running
6        started  10.196.78.238  juju-808105-6  [email protected]  charmed-hpc-tutorial  Running

Build the container image using apptainer

Before you can submit your container workload to your Charmed HPC cluster, you must build the container image from the build recipe. The build recipe file workload.def defines the environment and libraries that will be in the container image.

To build the image, return to the cluster login node, move to the example directory, and call apptainer build:

ubuntu@login:~$
juju ssh sackd/0
ubuntu@login:~$
cd /scratch/apptainer_example
ubuntu@login:~$
apptainer build workload.sif workload.def

The files for the Apptainer Mascot Vote example are provided here for reference.

generate.py
 1#!/usr/bin/env python3
 2
 3"""Generate example dataset for workload."""
 4
 5import argparse
 6
 7from faker import Faker
 8from faker.providers import DynamicProvider
 9from pandas import DataFrame
10
11
12faker = Faker()
13favorite_lts_mascot = DynamicProvider(
14    provider_name="favorite_lts_mascot",
15    elements=[
16        "Dapper Drake",
17        "Hardy Heron",
18        "Lucid Lynx",
19        "Precise Pangolin",
20        "Trusty Tahr",
21        "Xenial Xerus",
22        "Bionic Beaver",
23        "Focal Fossa",
24        "Jammy Jellyfish",
25        "Noble Numbat",
26    ],
27)
28faker.add_provider(favorite_lts_mascot)
29
30
31def main(rows: int) -> None:
32    df = DataFrame(
33        [
34            [faker.email(), faker.country(), faker.favorite_lts_mascot()]
35            for _ in range(rows)
36        ],
37        columns=["email", "country", "favorite_lts_mascot"],
38    )
39    df.to_csv("favorite_lts_mascot.csv")
40
41
42if __name__ == "__main__":
43    parser = argparse.ArgumentParser()
44    parser.add_argument(
45        "--rows", type=int, default=1, help="Rows of fake data to generate"
46    )
47    args = parser.parse_args()
48
49    main(rows=args.rows)
workload.py
 1#!/usr/bin/env python3
 2      
 3"""Plot the most popular Ubuntu LTS mascot."""
 4      
 5import argparse
 6import os
 7      
 8import pandas as pd
 9import plotext as plt
10      
11def main(dataset: str | os.PathLike, file: str | os.PathLike) -> None:
12    df = pd.read_csv(dataset)
13    mascots = df["favorite_lts_mascot"].value_counts().sort_index()
14      
15    plt.simple_bar(
16        mascots.index,
17        mascots.values,
18        title="Favorite LTS mascot",
19        color="orange",
20        width=150,
21    )
22      
23    if file:
24        plt.save_fig(
25            file if os.path.isabs(file) else f"{os.getcwd()}/{file}",
26            keep_colors=True
27        )
28    else:
29        plt.show()
30      
31if __name__ == "__main__":
32    parser = argparse.ArgumentParser()
33    parser.add_argument("dataset", type=str, help="Path to CSV dataset to plot")
34    parser.add_argument(
35        "-o",
36        "--output",
37        type=str,
38        default="",
39        help="Output file to save plotted graph",
40    )
41    args = parser.parse_args()
42      
43    main(args.dataset, args.output)
workload.def
bootstrap: docker
from: ubuntu:24.04

%files
    generate.py /usr/bin/generate
    workload.py /usr/bin/workload

%environment
    export PATH=/usr/bin/venv/bin:${PATH}
    export PYTHONPATH=/usr/bin/venv:${PYTHONPATH}

%post
    export DEBIAN_FRONTEND=noninteractive
    apt-get update -y
    apt-get install -y python3-dev python3-venv

    python3 -m venv /usr/bin/venv
    alias python3=/usr/bin/venv/bin/python3
    alias pip=/usr/bin/venv/bin/pip

    pip install -U faker
    pip install -U pandas
    pip install -U plotext

    chmod 755 /usr/bin/generate
    chmod 755 /usr/bin/workload

%runscript
    exec workload "$@"
submit_apptainer_mascot.sh
#!/usr/bin/env bash
#SBATCH --job-name=favorite-lts-mascot
#SBATCH --partition=tutorial-partition
#SBATCH --nodes=2
#SBATCH --error=mascot_error.txt
#SBATCH --output=mascot_output.txt

apptainer exec workload.sif generate --rows 1000000
apptainer run workload.sif favorite_lts_mascot.csv --output graph.out

Use the image to run jobs

Now that you have built the container image, you can submit a job to the cluster that uses the new workload.sif image to generate one million lines in a table and then uses the resulting favorite_lts_mascot.csv to build the bar plot:

ubuntu@login:~$
sbatch submit_apptainer_mascot.sh

To view the status of the job while it is running, run squeue.

Once the job has completed, view the generated bar plot that will look similar to the following:

ubuntu@login:~$
cat graph.out
────────────────────── Favorite LTS mascot ───────────────────────
│Bionic Beaver    ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 101124.00
│Dapper Drake     ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 99889.00
│Focal Fossa      ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 99956.00
│Hardy Heron      ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 99872.00
│Jammy Jellyfish  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 99848.00
│Lucid Lynx       ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 99651.00
│Noble Numbat     ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 100625.00
│Precise Pangolin ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 99670.00
│Trusty Tahr      ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 99366.00
│Xenial Xerus     ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇ 99999.00

Summary and clean up

Is this tutorial, you:

  • Deployed and integrated Slurm and a shared filesystem

  • Launched an MPI batch job and saw cross-node communicated results

  • Built a container image with Apptainer and used it to run a batch job and generate a bar plot

Now that you have completed the tutorial, if you would like to completely remove the virtual machine, return to your local terminal and multipass delete the virtual machine as follows:

ubuntu@local:~$
multipass delete -p charmed-hpc-tutorial

Next steps

Now that you have gotten started with Charmed HPC, check out the Explanation section for details on important concepts and the How-to guides for how to use more of Charmed HPC’s features.