'OpenMP and MPI program producing duplicated output
I have an MPI program that runs fine on my local machine but behaves weirdly when I run it on a cluster.
In a nutshell, rank zero scatters arrays of numbers to the other ranks. These ranks are responsible for finding the averages of the arrays. The averages are then gathered by process zero and printed.
When I run the code on my local machine, rank 0 will print out all the averages once they have been worked out. However, when I run this program on a cluster with (for example, four nodes) the output file will contain duplicate averages.
Does this happen because each node has a rank 0 process? How would I go about having only one rank 0 process for the whole cluster? Or am I using my job manager incorrectly? My script for SLURM is shown below.
#SBATCH --output=results.out
#SBATCH --nodes=4
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --cpus-per-task=1
#SBATCH --partition=partition_name
# Number of threads *per* process..
export OMP_NUM_THREADS=8
mpirun ./a.out
There is some OpenMP code in the program as well but I have removed that and made a separate version that only uses MPI so I can try narrow down the issue.
Any help is much appreciated, I just need someone to point me in the right direction. I'm sorry if this is a simple problem, I'm new to running code on a cluster. I have done searching online but I can't really find anything that applies to my situation. It seems like nobody has the issue with only executing blocks of code based on rank in a cluster. Is this problem perhaps an issue with my code? Unfortunately, I cannot post much code since it is a class assignment.
My code makes use of scatterv, scatter, and broadcast (I have run the code on my machine and made sure that all the processes are communicating correctly). I'm using the default communicator (MPI_COMM_WORLD).
My suspicion is that I need to do something with the communicator but I'm not entietly sure what exactly.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
