Section 5: High Performance Computing with Bluecrystal
Introduction
We’ve all been in the situation where we want to perform some extremely complicated computing process, such as MCMC or manipulating an extremely large data frame a lot of times, but our laptops just aren’t good enough. They don’t have enough cores, no dedicated GPU and only a small amount of memory. Luckily for academics and PhD students, universities in general sympathise with us. A high performance computing (HPC) cluster is a collection of highly powerful computers located somewhere that can be accessed remotely, and used to run terminal scripts for coding.
This portfolio will detail the use of Bluecrystal, the super computers available at the University of Bristol. To access these computers, one needs to do so through the terminal, and so a basic knowledge of manipulating and using file structures in bash
is required. Section 4 details the fundamentals of bash
if the reader is not already familiar.
Bluecrystal Phase 3
This tutorial will focus on Bluecrystal Phase 3, that is the third generation of Bluecrystal machines. The cluster is made up of 312 compute nodes, which are where the processes are run. The basic nodes have specifications:
- 64GB RAM
- 300TB storage
- 16 Cores
- Infiniband High Speed Network
As well as the large memory nodes, which have 256GB of RAM, and the GPU nodes which each have an NVIDIA Tesla K20 - an extremely powerful GPU.
Connecting to the HPC Cluster
To access the HPC cluster, you would need to log in via SSH (secure shell) from a University of Bristol connection (or a VPN). The ssh
command in bash is your friend here. To log in (assuming you have an account), you perform the following command:
ssh your_username@bluecrystalp3.bris.ac.uk
It will then ask for your password, which you supply immediately after writing this command. From here you will be in the log-in node, note that you should not run any code on the log-in node, as this node is only purposed for connecting to the compute nodes. If you run any large code on the log-in node you will slow down the HPC cluster for everyone else.
File System
Within the log-in node, you will have your own personal file directory where you can store your files and your code. By default, after logging in you will be in this directory, so if you use the ls
command you will see the contents of your personal directory immediately. You can write code here, through the terminal, or copy it into your directory from your own personal computer. You are free to make directories and files here, as the contents of your directory will be read by the compute nodes when you want to run a job.
To copy a file from your own computer to your file system in the HPC cluster, you can use the scp
command in bash
, a command that is run from your own computer only and looks like
scp path_to_my_file/file.txt your_username@bluecrystalp3.bris.ac.uk::path_inside_hpc/
Performing this operation would copy file.txt
in folder path_to_my_file
into the path_inside_hpc
folder on your directory in the HPC cluster. If you want to do this the other way around, and copy something from your HPC cluster file system to your personal computer, just switch the order of the arguments to scp
, but always do it from your own machine.
Running Jobs
To run a job, you must write a bash
script that tells the compute node what to do. This bash
script will be interpreted at the HPC node and run accordingly. Below is a general template for what this bash
script would look like to be passed across:
#!/bin/bash
#
#PBS -l nodes=2:ppn=1,walltime=24:00:00
# working directory
export WORK_DIR=$HOME
cd $WORK_DIR
# print to output
echo JOB ID: $PBS_JOBID
echo Working Directory `pwd`
# run something
/bin/hostname
The first line #!/bin/bash
tells the compiler to read this as bash
, and the third line #PBS -l nodes=2:ppn=1,walltime=24:00:00
gives information to the HPC cluster as to what you want for the job. You can change these arguments to your suiting, e.g. increase the walltime
if you think your code will run for more than 24 hours.
You must save this bash
script as something like run_R.sh
, and then when logged into Bluecrystal, use the command qsub
- meaning to submit this job to the queue, i.e.
qsub run_R.sh
which would add this job to the queue. Since there are many people that use the HPC cluster, your job may not start immediately, and you might have to wait. You may have to wait longer if your walltime
is particularly high, as you will be waiting for enough nodes to become available.
Other Functions
As well as qsub
, there are other commands that you can use to play with the HPC cluster. Some notable ones are
qstat
gives a list of current jobs being run and those in the queueqstat -u user_name
gives a list of current jobs queued and running byuser_name
qstat job_id
gives information about the jobjob_id
being runqdel job_id
deletes a job with a givenjob_id
Running Different Code
To run other programming languages on the HPC cluster, it can be a bit of faff. To run code such as Python or R on the cluster, you must first load the module associated with the particular language. On the log-in node, you can run
module avail
to get a list of all available modules. There will be a lot. Choose one that you like, for example, I am a personal fan of languages/R-3.6.2-gcc9.1.0
, and you can load this with module load module_name
, for example
module load languages/R-3.6.2-gcc9.1.0
which will allow you to run R and R scripts. To submit a job that runs an R script, you must add this line to the job script before you run the code. To run an R script from bash
, you use
Rscript script_name.R
Using R Packages
Since packages cannot be installed globally on the log-in node, you can install them locally instead. You first type the command R
into bash, and then install.packages("package_name")
. It will ask you if you want the package to be installed locally, which you say yes to.
After this, all packages installed on your local file system on the log-in node will be accessible as normal when running job scripts.