Intro to OpenMP in C++
Introduction
‘Normal’ programming can deal with performing operations one at a time, i.e. writing code in such a way that it can be seen as a set of instructions to be performed sequentially. A more efficient process will be maximising the use of the cores in your processor so that multiple operations are performed at once, with different processes happening on different cores.
OpenMP is a way of performing parallel computation for C, C++ and Fortran, but this portfolio will go over the use of OpenMP in C++.
Basics
A simple C++ script which uses OpenMP for parallel computing will look like this
#include <iostream>
#ifdef _OPENMP
#include <omp.h>
#else
#define omp_get_num_threads() 0
#define omp_get_thread_num() 0
#endif
int main(int argc, const char **argv)
{
std::cout << "Hello I am here safe and sound home in the main thread.\n";
#pragma omp parallel
{
int nthreads = omp_get_num_threads();
int thread_id = omp_get_thread_num();
std::cout << "Help I am trapped in thread number " << thread_id
<< " out of a total " << nthreads
<< std::endl;
}
std::cout << "Thank god I'm safe back home now.\n";
return 0;
}
The important parts here are the ifdef _OPENMP
section at the start, and the #pragma omp parallel
line before the process of the script. The comments succeeding the hash symbol can be seen as a ‘hint’ to the compiler of what to do. The compiler is free to ignore this if need be.
The if_def _OPENMP
line checks if we are using OpenMP or not, and if so, includes the omp.h
header file. The pragma omp parallel
line tells the compiler to run the section enclosed in braces {}
a certain amount of times, depending on the input number of threads.
We can compile this code (which is saved in basic_openmp.cpp
) to see what happens.
g++ -fopenmp basic_openmp.cpp -o basic_openmp
export OMP_NUM_THREADS=1
./basic_openmp
## Hello I am here safe and sound home in the main thread.
## Help I am trapped in thread number 0 out of a total 1
## Thank god I'm safe back home now.
g++ -fopenmp basic_openmp.cpp -o basic_openmp
export OMP_NUM_THREADS=8
./basic_openmp
## Hello I am here safe and sound home in the main thread.
## Help I am trapped in thread number Help I am trapped in thread number Help I am trapped in thread number Help I am trapped in thread number Help I am trapped in thread number 462 out of a total 8 out of a total out of a total 8
##
## 8
## 3 out of a total 8
## Help I am trapped in thread number 0 out of a total 8
## Help I am trapped in thread number 5 out of a total 8
## 7 out of a total 8
## Help I am trapped in thread number 1 out of a total 8
## Thank god I'm safe back home now.
This person really got trapped in a time loop. This is because the threads do not run sequentially, so each thread is printing out what it is asked as soon as it can, and these commands are being run at the same time.
Sections and Loops
OpenMP sections are ways of telling the compiler that each section can be operated on different threads. This is done by adding the line #pragma omp section
and braces {}
to each section that can be operated on individually, but only once. This is a way of breaking your code into parallel ‘chunks’ without executing the same code a lot of times by each thread.
A loop can be run in parallel by writing the line #pragma omp for
above the loop. This specifies that each iteration in the loop is independent and can be run separately. Then each thread runs on a different iteration of the loop at the same time. Below is an example of a loop running in parallel.
#include <iostream>
#ifdef _OPENMP
#include <omp.h>
#else
#define omp_get_thread_num() 0
#endif
int main(int argc, const char **argv)
{
#pragma omp parallel
{
int nloops = 0;
#pragma omp for
for (int i=0; i<1000; ++i)
{
++nloops;
}
int thread_id = omp_get_thread_num();
#pragma omp critical
{
std::cout << "Thread " << thread_id << " performed "
<< nloops << " iterations of the loop.\n";
}
}
return 0;
}
This code will display how many iterations of the loop each thread is performing. Note that the #pragma omp critical
line is specifying that only one thread can enter the code in braces {}
at a time, just so that the printed output does not get jumbled up like it has done previously. In most code, this line will not be added as it will come with a significant slow down, I have only included it here for illustration purposes. We can compile and then run this code.
g++ -fopenmp loop_openmp.cpp -o loop_openmp
export OMP_NUM_THREADS=4
./loop_openmp
## Thread 0 performed 250 iterations of the loop.
## Thread 1 performed 250 iterations of the loop.
## Thread 2 performed 250 iterations of the loop.
## Thread 3 performed 250 iterations of the loop.
So these threads have split the job evenly, but in some cases (maybe for a large amount of threads) the job would not be split evenly.
Monte Carlo Exercise
For an exercise, consider using an OpenMP parallel program to calculate \(\pi\) using a Monte Carlo algorithm. We can do this by simulating a unit circle and a unit square, since \(\pi\) is the ratio of the area of a circle to the area of a square. By simulating random points for a circle and for a square, provided we have a large enough number of simulations we can estimate this proportion and hence \(\pi\).
#include <cmath>
#include <cstdlib>
#include <iostream>
#ifdef _OPENMP
#include <omp.h>
#else
#define omp_get_thread_num() 0
#endif
double rand_one()
{
return std::rand() / (RAND_MAX + 1.0);
}
int main()
{
// declare variables
int circle_points = 0;
int square_points = 0;
int circle_points_loop = 0;
int square_points_loop = 0;
// set up parallel OpenMP
#pragma omp parallel
{
// run for loop in parallel
#pragma omp for
for(int ii=0; ii < 100000; ii++)
{
// get random x and y coordinates
double x_coord = (2*rand_one() - 1);
double y_coord = (2*rand_one() - 1);
// calculate radius
double r = std::sqrt(pow(x_coord,2) + pow(y_coord,2));
// if r is less than or equal to 1 then it is within the circle
if(r < 1.0)
{
++circle_points_loop;
} else
{
++square_points_loop;
}
}
// use critical when counting the final number of counts for each thread
#pragma omp critical
{
circle_points += circle_points_loop;
square_points += square_points_loop;
}
}
// calculate final value of pi using ratios
double pi = (4.0*circle_points)/(square_points+circle_points);
// print pi
std::cout << pi << std::endl;
return 0;
}
The comments in the code above will explain why each section of code is run at each point in time. To check this result is valid, we can compile and run it.
g++ -fopenmp pi.cpp -o pi
export OMP_NUM_THREADS=8
./pi
## 3.13372
Which is not a bad approximation!