How to deploy a shared filesystem¶

Charmed-HPC allows automatic integration with shared filesystems using the filesystem-client charm. This how-to guide shows you how to deploy filesystem-client to integrate with externally managed shared filesystems.

Note

If you plan on using Terraform to handle your deployment, we also provide Terraform modules to setup a cloud managed NFS server on the charmed-hpc-terraform repository, with examples on how to deploy the modules.

Prerequisites¶

A Slurm cluster.

Deploy an external filesystem server¶

External servers that provide a shared filesystem cannot be integrated directly. Instead, we can use a proxy charm in order to expose the required information to applications managed by Juju.

NFS

To integrate with an external NFS server, you will require:

An externally managed NFS server.
The server’s hostname.
The exported path.
(optional) the port.

Each public cloud has its own procedure to deploy a public NFS server. Provided here are links to the set up procedures on a few well-known public clouds.

Amazon Web Services

Set up information.

docs.aws.amazon.com

Microsoft Azure

Set up information.

learn.microsoft.com

Google Cloud Platform

Set up information.

cloud.google.com

However, if only a minimal server for testing is necessary, a small NFS server can be set up with LXD.

After gathering all the required information, you can deploy the nfs-server-proxy charm in order to expose the externally managed server inside a Juju model.

juju deploy nfs-server-proxy \
  --channel latest/edge \
  --config hostname=<server hostname> \
  --config path=<exported path> \
  --config port=<server port>

CephFS

To integrate with an external CephFS share, you will require:

The unique identifier of the cluster (commonly known as fsid).
The name of the filesystem within the Ceph cluster.
The exported path of the filesystem.
The list of hostnames for MON nodes of the Ceph cluster.
The username with permissions to access the filesystem.
The cephx key for the username.

Here, a Ceph cluster will be set up using MicroCeph.

First, launch a virtual machine using LXD:

snap install lxd
lxd init --auto
lxc launch ubuntu:24.04 cephfs-server --vm
lxc shell cephfs-server

Inside the LXD virtual machine, set up MicroCeph to export a Ceph filesystem.

# Setup environment
ln -s /bin/true /usr/local/bin/udevadm
apt-get -y update
apt-get -y install ceph-common jq
snap install microceph

# Bootstrap Microceph
microceph cluster bootstrap

# Add a storage disk to Microceph
microceph disk add loop,2G,3

We will create two new disk pools, then assign the two pools to a new filesystem with the name cephfs.

# Create a new data pool for our filesystem
microceph.ceph osd pool create cephfs_data

# and a metadata pool for the same filesystem
microceph.ceph osd pool create cephfs_metadata

# Create a new filesystem that uses the two created data pools
microceph.ceph fs new cephfs cephfs_metadata cephfs_data

We will also use fs-client as the username for the clients, and expose the whole directory tree (/) in read-write mode (rw).

microceph.ceph fs authorize cephfs client.fs-client / rw

Note

You can verify if the CephFS server is working correctly by using the command microceph.ceph fs status cephfs while inside the LXD virtual machine.

To gather the required information for proxying the externally managed Ceph filesystem:

export HOST=$(hostname -I | tr -d '[:space:]'):6789
export FSID=$(microceph.ceph -s -f json | jq -r '.fsid')
export CLIENT_KEY=$(microceph.ceph auth print-key client.fs-client)

Print the required information for reference and then exit the current shell session:

echo $HOST
echo $FSID
echo $CLIENT_KEY
exit

Having collected all the required information, you can deploy the cephfs-server-proxy charm to expose the externally managed Ceph filesystem inside a Juju model.

juju deploy cephfs-server-proxy \
  --channel latest/edge \
  --config fsid=<value of $FSID> \
  --config sharepoint=cephfs:/ \
  --config monitor-hosts="<value of $HOST>" \
  --config auth-info=fs-client:<value of $CLIENT_KEY>

Deploy the filesystem-client¶

To add the filesystem-client charm, which mounts a shared filesystem to the cluster nodes:

juju deploy filesystem-client \
  --channel latest/edge \
  --config mountpoint='/scratch' \
  --config noexec=true

The mountpoint configuration represents the path that the filesystem will be mounted onto.

filesystem-client is a subordinate charm that can automatically mount any shared filesystems for the application related with it. In this case, we will relate it to the slurmd application in order to have a shared storage between all the compute nodes in the cluster:

juju integrate slurmd:juju-info filesystem-client:juju-info

Relate the filesystem client with the filesystem provider¶

Every filesystem provider can be integrated with the filesystem client using the filesystem endpoint.

juju integrate filesystem-client:filesystem <filesystem-provider>:filesystem

Afterwards, test that the filesystem is accessible to read and write from the slurmd application machines:

juju ssh slurmd/0 -- touch /scratch/script.py
juju ssh slurmd/1 -- stat /scratch/script.py