Search for question
Question

Department of CSE, The University of Texas at Arlington CSE5351/CSE4351: Parallel Processing Spring Semester, 2023 Homework Assignment 5-6 combined Due Date May 2, 2023 (no Resubmission) Write and test the following programs using C and MPI on the Stampede. (Email the Word file with the results and c code files to the TA) This is a combined homework for 5 and 6. The total points are 200/200 (20% of grade). 1. PROBLEM DESCRIPTION (40 points) Write a data-parallel program using distributed non-shared memory model as taught in the class for the sieve of Eratosthenes. Use MPI for message passing and test the program on the Stampede supercomputer. The program has two inputs: The largest number, n, up to which the prime numbers are to be found, and, p, the number of processors. The program should run for any number of cores (processors) ranging from 1 to 32. Make the rest of the assumptions yourself but your grade will be based on how good your parallelization and communication scheme is. So feel free to make optimizations. The output of your program is the prime numbers found by your program. Attach a note on your parallelization scheme, and include a speedup vs processor plot (include a number of curves, each with a reasonable number n, to be chosen by you) against the sequential algorithm implemented on one of the Stampede processors. 2. PROBLEM DESCRIPTION (60 points) You are familiar with the numerical integration for calculating using the rectangle rule. Simpson's Rule is a better integration algorithm than the rectangle rule because it converges quickly. Suppose we want to compute for f (x) dx. We divide the interval [a, b] into n sub intervals where n is even. Let x; denote the end of the ith interval, for 1≤i≤ n, and let xo denote the beginning of the first interval. According to Simpson's rule: 1 ſº ƒ (x)dx = ± fx ¸ − fx „ +Ỷ (4ƒ (x21-1) + 2ƒ (x2)) f 3n - n fx„+Ź i=1 4 In the case of л calculation problem, f(x)= -, a = 0, b = 1, and n is an input parameter. (1+x²) Write a parallel program using MPI to compute л using Simpson's Rule. The program should be able to run on any number of processors (to be specified at run-time). Run and test your program on the Stampede. 1 (a) Attach a note on your parallelization scheme, and include a time vs processors (include a number of curves, each with a reasonable number n, 1-32 with step size 2) plot for the simple and Simpson's rule (r curves on the same plot). Also, draw a scalability table for (b) Include a set of scalability vs processors (include a number of curves, each with a reasonable number n, to be chosen by you) plots (b1) for the Simpson's rule (r curves on the same plot) (b2) for the Simple Scheme (c) Include an accuracy vs processors (include a number of curves, each with a reasonable number n, to be chosen by you) plot for the simple and Simpson's rule (r curves on the same plot) INPUTS The number of intervals, n, and the number of processors, p. 3. PROBLEM DESCRIPTION (40 points) Q 1. Write an MPI program for calculating the latency and communication time between two nodes using the ping pong algorithm as taught in class. Generate a curve by varying the data size between 0 and 512 bytes with increments of 32 bytes Generate another curve by varying the data size between 1kbytes to 128kbytes, with increments of 1k. Q 2. Write another MPI program for calculating the latency and communication time between two nodes using the hot potato algorithm as taught in class. Generate a curve by varying the data size between 0 and 512 bytes with increments of 32 bytes Generate another curve by varying the data size between 1kbytes to 128kbytes, with increments of 1k. 4. PROBLEM DESCRIPTION (60 points) Write an MPI parallel program for solving the Back Substitution algorithm as taught in class. However, you will implement this on a distributed memory machine (the Stampede), using a matrix size (diagonal) of size 128 by 128. Also, use 256 by 256, and 1024 by 1024 matrixes. First, you will create a matrix (no need to make the matrix generation program parallel), in which you will assume the values of X0, X1, .....X127 as X0 = 1.0, X₁ = 2.0, X2 = 3.0, ..... .X127 = 128.0 (all numbers will be double precision). Then you will randomly generate the values (plus or minus) of Ai, j coefficients and calculate the value of Bis 2 For example, the last equation in the 128 by 128 matrix will be A127,127 *X127= B127 X127 128.0, so generate A 127, 127 to be 3.25 (as an example), and thus B127 will be equal to 416.0. Similarly, you can assume the value of X126 in the next above equation to be 127.0, generate random values for A126, 126, and A 126, 127 on the left side of the equation and calculate the value of B126 on the right side of the equation so that the left and rights sides of the equation are equal if we substitute the values of X 127 and X128. After generating the matrix and the vector Bi, you can partition this matrix row-blocks across processors (assume the matrix is divisible by the number of processors). Calculate the speedup for the following combinations a. Matrix size = 128 by 128, number of processors = 4 b. C. Matrix size = 256 by 256, number of processors = 4 Matrix size = 256 by 256, number of processors = 8 d. Matrix size=1024 by 1024, number of processors = 4 e. Matrix size=1024 by 1024, number of processors = 8 f. Matrix size=1024 by 1024, number of processors = 16 Submissions; Submit your homework with a Word file and text files of the C code by email to the TA Addison Clark, addison.clark@mavs.uta.edu (not to the professor). Please send all code and all documents in a WORD file. No hand written material will be accepted. Deadline, May 2, 2023. 3