department of cse the university of texas at arlington cse5351 cse4351
Search for question
Question
Department of CSE, The University of Texas at Arlington
CSE5351/CSE4351: Parallel Processing
Spring Semester, 2023
Homework Assignment 5-6 combined
Due Date May 2, 2023 (no Resubmission)
Write and test the following programs using C and MPI on the Stampede. (Email the Word file
with the results and c code files to the TA)
This is a combined homework for 5 and 6. The total points are 200/200 (20% of grade).
1. PROBLEM DESCRIPTION (40 points)
Write a data-parallel program using distributed non-shared memory model as taught in the class
for the sieve of Eratosthenes. Use MPI for message passing and test the program on the Stampede
supercomputer.
The program has two inputs: The largest number, n, up to which the prime numbers are to be
found, and, p, the number of processors. The program should run for any number of cores
(processors) ranging from 1 to 32.
Make the rest of the assumptions yourself but your grade will be based on how good your
parallelization and communication scheme is. So feel free to make optimizations. The output of
your program is the prime numbers found by your program. Attach a note on your parallelization
scheme, and include a speedup vs processor plot (include a number of curves, each with a
reasonable number n, to be chosen by you) against the sequential algorithm implemented on one
of the Stampede processors.
2. PROBLEM DESCRIPTION (60 points)
You are familiar with the numerical integration for calculating using the rectangle rule.
Simpson's Rule is a better integration algorithm than the rectangle rule because it converges
quickly. Suppose we want to compute for f (x) dx. We divide the interval [a, b] into n sub
intervals where n is even. Let x; denote the end of the ith interval, for 1≤i≤ n, and let xo denote the
beginning of the first interval. According to Simpson's rule:
1
ſº ƒ (x)dx = ± fx ¸ − fx „ +Ỷ (4ƒ (x21-1) + 2ƒ (x2))
f
3n
-
n
fx„+Ź
i=1
4
In the case of л calculation problem, f(x)=
-, a = 0, b = 1, and n is an input parameter.
(1+x²)
Write a parallel program using MPI to compute л using Simpson's Rule. The program should be
able to run on any number of processors (to be specified at run-time). Run and test your program
on the Stampede.
1 (a) Attach a note on your parallelization scheme, and include a time vs processors (include a
number of curves, each with a reasonable number n, 1-32 with step size 2) plot for the
simple and Simpson's rule (r curves on the same plot).
Also, draw a scalability table for
(b) Include a set of scalability vs processors (include a number of curves, each with a
reasonable number n, to be chosen by you) plots
(b1) for the Simpson's rule (r curves on the same plot)
(b2) for the Simple Scheme
(c) Include an accuracy vs processors (include a number of curves, each with a reasonable
number n, to be chosen by you) plot for the simple and Simpson's rule (r curves on the
same plot)
INPUTS
The number of intervals, n, and the number of processors, p.
3. PROBLEM DESCRIPTION (40 points)
Q 1. Write an MPI program for calculating the latency and communication time between two
nodes using the ping pong algorithm as taught in class.
Generate a curve by varying the data size between 0 and 512 bytes with increments of 32 bytes
Generate another curve by varying the data size between 1kbytes to 128kbytes, with increments
of 1k.
Q 2. Write another MPI program for calculating the latency and communication time between
two nodes using the hot potato algorithm as taught in class.
Generate a curve by varying the data size between 0 and 512 bytes with increments of 32 bytes
Generate another curve by varying the data size between 1kbytes to 128kbytes, with increments
of 1k.
4. PROBLEM DESCRIPTION (60 points)
Write an MPI parallel program for solving the Back Substitution algorithm as taught in class.
However, you will implement this on a distributed memory machine (the Stampede), using a
matrix size (diagonal) of size 128 by 128. Also, use 256 by 256, and 1024 by 1024 matrixes.
First, you will create a matrix (no need to make the matrix generation program parallel), in which
you will assume the values of X0, X1, .....X127 as X0 = 1.0, X₁ = 2.0, X2 = 3.0, ..... .X127 = 128.0
(all numbers will be double precision).
Then you will randomly generate the values (plus or minus) of Ai, j coefficients and calculate the
value of Bis
2 For example, the last equation in the 128 by 128 matrix will be A127,127 *X127= B127
X127 128.0, so generate A 127, 127 to be 3.25 (as an example), and thus B127 will be equal to 416.0.
Similarly, you can assume the value of X126 in the next above equation to be 127.0, generate
random values for A126, 126, and A 126, 127 on the left side of the equation and calculate the value of
B126 on the right side of the equation so that the left and rights sides of the equation are equal if
we substitute the values of X 127 and X128.
After generating the matrix and the vector Bi, you can partition this matrix row-blocks across
processors (assume the matrix is divisible by the number of processors).
Calculate the speedup for the following combinations
a.
Matrix size
=
128 by 128, number of processors = 4
b.
C.
Matrix size = 256 by 256, number of processors = 4
Matrix size = 256 by 256, number of processors = 8
d. Matrix size=1024 by 1024, number of processors = 4
e.
Matrix size=1024 by 1024, number of processors = 8
f. Matrix size=1024 by 1024, number of processors = 16
Submissions; Submit your homework with a Word file and text files of the C code by email to the
TA Addison Clark, addison.clark@mavs.uta.edu (not to the professor).
Please send all code and all documents in a WORD file. No hand written material will be
accepted.
Deadline, May 2, 2023.
3