Assignment 9: Parallel histogram computation with MPI
This week's assignment is similar to that of assignment 8, i.e., we wish to compute the distributions of the logarithm of the number of time steps walkers needed to reach the bottom of a porous medium. However, we now have a 200x larger data set, which you can download below or on the teach cluster from /home/l/lcl_uotphy1610s1001/morestepnumbers.dat .
To parallelize this larger data set, we will distribute the data points over MPI processes. With this distributed array, we will be able to compute histograms in parallel.
Your task is to write and run an MPI program that performs the following:
1) First, a root process reads the command line arguments that are the logbase, the filename with the data, as well as the batch size Z.
2) Next, the root process reads a batch of numbers of size Z. After each batch is read, the data points are to be distributed to the MPI processes using a scatter. This is repeated until all numbers have been distributed.
3) Once all data points have been distributed over the MPI processes, each process should compute a histogram of its points (using the same log base).
4) The results of the distributed histograms should be collected by the root process. The normalized histogram should then be printed out to the console as two columns (the start of the histogram in column 1 and the fraction of the data points in the second)
5) Create a second version where, instead of steps (2) and (3), the numbers are read in in parallel by the MPI processes.
Your program should a batch size Z=100'000, and log base of 1.1. Write job scripts to run both versions of this code for P=1, 8, 20, 40 and 80 processes on the Teach cluster, timing the result. Submit these jobs to the queue and save their output. The output of these five runs should be identical except for the timing.
As before, we expect you to use make and git with have several meaningful commits, and to have added a README file to the project.
Submit your work (code, Makefile, job scripts, README, git directory and job script outputs) in one zip file by April 7th, 23:59 PM. (Do not include the data!) The usual late penalty applies.
- 31 mars 2025, 14:44