Advanced Examples

Using Containers

Researchers can use Docker or Singularity containers within the cluster. This is a great way to run difficult-to-compile applications or to share workflows among colleagues.

See also: Singularity

Running an interactive container

User can run within a container interactively, this is great for testing code before running a job. Here is an example of running within a docker container that has the blockchain software called BlockSci:

module load singularity
srun --pty -c 4 --mem=16G bash
singularity pull docker://tislaamo/blocksci
singularity shell blocksci.simg

Once you have typed the singularity shell command you will be within the container and can type the commands available from within the container such as the BlockSci utility blocksci_parser

Running a container in batch

Running a batch job with containers is similar to running a regular job, but will ultimately depend on how the container was created, so your mileage may vary. Here is an example batch submit script that will run the autometa software that was created in a docker image, lets name the submit file runContainer.sh:

#SBATCH -J autometa-job
#SBATCH -c 4
#SBATCH --mem=16G
#SBATCH --mailtype=BEGIN,END,FAIL
#SBATCH --mail-user=myemail@email.net
#SBATCH --time=12:00:00

module load singularity
singularity pull docker://jasonkwan/autometa:latest
singularity exec autometa_latest.sif calculate_read_coverage.py somedata.dat

Now to run the file you can:

sbatch runContainer.sh

Note that singularity shell is primarily for interactive use and singularity exec (or possibly singularity run) are for executing the applications that were built within the container directly. It is important to know how the container was created to make effective use of the software.

Starting and Working with a Jupyter Notebook

Running Jupyter notebooks on Rāpoi is usually a two step processes.
First you start the jupyter server on a compute node - either via an interactive session or an sbatch job.
Then you connect to Rāpoi again via a new ssh session port forwarding the port selected by Jupter to your local machine for the web session.
There is a potentially simpler method at the end of this guide using firefox and tab containers.

For general information on using Python, see the Python users guide.

The first step is getting and modifying a submission script.

Example submission scripts are included at /home/software/vuwrc/examples/jupyter/

notebook-bare.sh # The base notebook script - you manage your dependancies via pip
notebook-anaconda.sh # a version for if you prefer anaconda
R-notebook.sh # Using the R kernel instead
R-notebook-anaconda.sh # R kernel and managed by anaconda

All these scripts will need to be copied to your working directory and modified to suit your needs.

In each case you'll need to install your dependancies first - at a bare minimum you'll need Jupyter, installed either via pip or anaconda.

Note if you are intending to do anything needing GPU in your notebooks, you'll need to do all these installs in the gpu or highmem nodes as you'll likely need the relavent CUDA modules loaded during the installs.

notebook-bare.sh example

Step 1: The best way to start jupyter is with a batch submit script. We have created an example script. You can copy this script from one available on the cluster, just type the following:

cp /home/software/vuwrc/examples/jupyter/notebook-bare.sh notebook.sh
If you are using Anaconda and have installed it in the default location you need to use the following submit file instead:
cp /home/software/vuwrc/examples/jupyter/notebook-anaconda.sh notebook-anaconda.sh

If you have any python dependancies you will need to install them before you run your script. You will also have to install jupyter. Currenly you'll need to do that in an interactive session. You only need to do this once.

srun -c 4  --mem=8G --partition=quicktest --time=0-01:00:00 --pty bash # get a 1 hour interactive session on quicktest
#prompt changes to something like 
#<username@itl02n02> you are now "on" a quicktest node

# Load required module for the python version in the notebook-bare.sh
module load gompi/2022a
module load Python/3.10.4-bare

python3 -m venv env # setup python virtual env in the env directory
pip install jupyterlab pandas plotnine # install dependancies - you *must* at least install jupyter

#exit the interactive session
exit
#prompt changes to something like 
#<username@raapoi-master> you are now back on the login/master node

This script is ready to run as is, but we recommend editing it to satisfy your own CPU, memory and time requirements. Once you have edited the file you can run it thusly:

sbatch notebook.sh

or if using Anaconda:

sbatch notebook-anaconda.sh
NOTE: If you copied the R notebook script, replace notebook.sh with R-notebook.sh

This will submit the file to run a job. It may take some time for the job to run, depending on how busy the cluster is at the time. Once the job begins to run you will see some information in the file called notebook-JOBID.out (JOBID will be the actual jobid of this job, eg notebook-478903.out. If you view this file (users can type cat notebook-JOBID.out to view the file onscreen). You will see a line such as:

    Or copy and paste one of these URLs:
        http://130.195.19.20:47033/lab?token=<some string of characters>
        http://127.0.0.1:47033/lab?token=<some string of characters>

The 2 important pieces of information here are the IP address, in this case 130.195.19.20 and the port number, 47033. These numbers should be different for you since the port number is random, although the IP Address may be the same since we have a limited number of compute nodes. Also notice after the ?token= you will see a random hash. This hash is a security feature and allows you to connect to the notebook. You will need to use these to view the notebook from your local machine.

Step 2: To start working with the notebook you will need to tunnel a ssh session. In your SSH tunnel you will use the cluster login node (raapoi.vuw.ac.nz) to connect to the compute node (in the example above the compute node is at address 130.195.19.20) and transfer all the traffic back and forth between your computer and the compute node).

Step 2a from a Mac:

Open a new session window from Terminal.app or other terminal utility such as Xquartz and type the following:

ssh -L <PORT_NUMBER>:<IP_ADDRESS>:<PORT_NUMBER> username@raapoi.vuw.ac.nz

For example:

ssh -L 47033:130.195.19.20:47033 harrelwe@raapoi.vuw.ac.nz

Once you are at a prompt you can go to Step 3

Step 2b: from Windows

We recommend tunnelling using Git Bash, which is part of the Git for Windows project or MobaXTerm. There are 2 methods for tunneling in Moba, one is command line, the other is GUI-based.

Method 1 (Git Bash or MobaXterm): Command-line, start a local Git Bash or MobaXterm terminal (or try the GUI method, below)

From the command prompt type:

ssh -L <PORT_NUMBER>:<IP_ADDRESS>:<PORT_NUMBER> username@raapoi.vuw.ac.nz

For example:

ssh -L 47033:130.195.19.20:47033 harrelwe@raapoi.vuw.ac.nz

Once you are at a prompt you can go to Step 3

Method 2 (MobaXterm): GUI-based, go to the Tunneling menu:

Now click on New SSH Tunnel

When you complete the creation of your tunnel click Start all tunnels. Enter your password and reply "yes" to any questions asked about accepting hostkeys or opening firewalls. You can safely exit the tunnel building menu.

Step 3

Now open your favorite web browser and then use the URL from your job output file and paste it in your browsers location bar, for example my full URL was:

http://130.195.19.20:47033/?token=badef11b1371945b314e2e89b9a182f68e39dc40783ed68e

Step 4

One last thing you need to do is to replace the IP address with the word localhost. This will allow your browser to follow the tunnel you just opened and connect to the notebook running on an engaging compute node, in my case my address line will now look like this:

http://localhost:47033/?token=badef11b1371945b314e2e89b9a182f68e39dc40783ed68e

Now you can hit return and you should see your notebook running on an Engaging compute node.

NOTE: when you are done with your notebook, please remember to cancel your job to free up the resources for others, hint: scancel

If you want more information on working with Jupyter, there is good documentation, here: Jupyter Notebooks

Working with notebooks using Firefox tab containers

There is a perhaps simpler single step process of working with jupyter notebooks. It relies on some nice features in Firefox. Firefox has tab containers - you can have categories of tabs that are basically independent from each other with separate web cookies but importantly for our case separate proxy settings. You will also currenly need to get the firefox add-on Container-Proxy its github page

Setup a tab container in Firefox called something like Raapoi. Use the container-proxy extension to assign a proxy to that tab set. I choose 9001, but you can use any fairly high port number - note it doens't matter if many people connect on the same port.

When you connect to raapoi, use ssh socks5 proxy settings. In MacOS/linux/wsl2

ssh -D 9001 <username>@raapoi.vuw.ac.nz

In Firefox open a new tab by holding down the new tab + button and selecting your Raapoi tab container. Any tabs opened with that container will have all their webtraffic directed via the Rāpoi login node. Your laptop/desktop can't directly see all the compute nodes, but the login node can. When you start a jupyter notebook and get the message:

    Or copy and paste one of these URLs:
        http://130.195.19.20:47033/lab?token=<some string of characters>
        http://127.0.0.1:47033/lab?token=<some string of characters>

You can just immediatly open http://130.195.19.20:47033/lab?token=<some string of characters> in your Raapoi container tab.

OpenMPI users guide

Which versions of OpenMPI are working on Rāpoi?

There are a number of versions of OpenMPI on Rāpoi, although many of these are old installations (prior to an OS update and changes to the module system) and may no longer work. Generally speaking, your best bet is to try a version which appears when you search via module spider OpenMPI (noting that the capital 'O M P I' is important here). A few examples of relatively recent version of OpenMPI which are available (as of April 2024) are OpenMPI/4.1.1, OpenMPI/4.1.4 and OpenMPI/4.1.6.

Each of these OpenMPI modules has one or more of pre-requisite modules that need to be loaded first (generally a specific version of GCC compilers). To find out what you need to load first for a specific version of OpenMPI you just need to check the output of module spider OpenMPI/x.y.z (with the appropriate values for x,y,z). One of the examples below shows how to use OpenMPI/4.1.6. In cases where your code utilises software from another module which also requires a specific GCC module, that will dictate which version of OpenMPI to load (i.e. whichever one depends on the same GCC version). Otherwise, you are free to use any desired OpenMPI module.

Known issues and workarounds

There is a known issue with the communication/networking interfaces with several of the installations of OpenMPI. The error/warning messages occur sporadically, making it difficult to pin down and resolve, but it is likely there is a combination of internal and external factors that cause this (OpenMPI is a very complex beast). The warning messages take the form:

Failed to modify UD QP to INIT on mlx5_0: Operation not permitted
A workaround is described below, this page will be updated in the future when a more permanent solution is found.

Exectute your mpi jobs using the additional arguments:

mpirun -mca pml ucx -mca btl '^uct,ofi' -mca mtl '^ofi' -np $SLURM_NTASKS <your executable>
This will ensure OpenMPI avoids trying to use the communication libraries which are problematic. If your executable is launched without using mpirun (i.e. it implements its own wrapper/launcher), you will instead need to set the following environment variables:
export OMPI_MCA_btl='^uct,ofi'
export OMPI_MCA_pml='ucx'
export OMPI_MCA_mtl='^ofi'