Search for question
Question

Department of CSE, The University of Texas at Arlington CSE5351/CSE4351: Parallel Processing Spring Semester, 2023 Homework Assignment 5-6 combined Due Date May 2, 2023 (no Resubmission) Write and test the following programs using C and MPI on the Stampede. (Email the Word file with the results and c code files to the TA) This is a combined homework for 5 and 6. The total points are 200/200 (20% of grade). 1. PROBLEM DESCRIPTION (40 points) Write a data-parallel program using distributed non-shared memory model as taught in the class for the sieve of Eratosthenes. Use MPI for message passing and test the program on the Stampede supercomputer. The program has two inputs: The largest number, n, up to which the prime numbers are to be found, and, p, the number of processors. The program should run for any number of cores (processors) ranging from 1 to 32. Make the rest of the assumptions yourself but your grade will be based on how good your parallelization and communication scheme is. So feel free to make optimizations. The output of your program is the prime numbers found by your program. Attach a note on your parallelization scheme, and include a speedup vs processor plot (include a number of curves, each with a reasonable number n, to be chosen by you) against the sequential algorithm implemented on one of the Stampede processors. 2. PROBLEM DESCRIPTION (60 points) You are familiar with the numerical integration for calculating using the rectangle rule. Simpson's Rule is a better integration algorithm than the rectangle rule because it converges quickly. Suppose we want to compute for f (x) dx. We divide the interval [a, b] into n sub intervals where n is even. Let x; denote the end of the ith interval, for 1≤i≤ n, and let xo denote the beginning of the first interval. According to Simpson's rule: 1 ſº ƒ (x)dx = ± fx ¸ − fx „ +Ỷ (4ƒ (x21-1) + 2ƒ (x2)) f 3n - n fx„+Ź i=1 4 In the case of л calculation problem, f(x)= -, a = 0, b = 1, and n is an input parameter. (1+x²) Write a parallel program using MPI to compute л using Simpson's Rule. The program should be able to run on any number of processors (to be specified at run-time). Run and test your program on the Stampede. 1 (a) Attach a note on your parallelization scheme, and include a time vs processors (include a number of curves, each with a reasonable number n, 1-32 with step size 2) plot for the simple and Simpson's rule (r curves on the same plot). Also, draw a scalability table for (b) Include a set of scalability vs processors (include a number of curves, each with a reasonable number n, to be chosen by you) plots (b1) for the Simpson's rule (r curves on the same plot) (b2) for the Simple Scheme (c) Include an accuracy vs processors (include a number of curves, each with a reasonable number n, to be chosen by you) plot for the simple and Simpson's rule (r curves on the same plot) INPUTS The number of intervals, n, and the number of processors, p. 3. PROBLEM DESCRIPTION (40 points) Q 1. Write an MPI program for calculating the latency and communication time between two nodes using the ping pong algorithm as taught in class. Generate a curve by varying the data size between 0 and 512 bytes with increments of 32 bytes Generate another curve by varying the data size between 1kbytes to 128kbytes, with increments of 1k. Q 2. Write another MPI program for calculating the latency and communication time between two nodes using the hot potato algorithm as taught in class. Generate a curve by varying the data size between 0 and 512 bytes with increments of 32 bytes Generate another curve by varying the data size between 1kbytes to 128kbytes, with increments of 1k. 4. PROBLEM DESCRIPTION (60 points) Write an MPI parallel program for solving the Back Substitution algorithm as taught in class. However, you will implement this on a distributed memory machine (the Stampede), using a matrix size (diagonal) of size 128 by 128. Also, use 256 by 256, and 1024 by 1024 matrixes. First, you will create a matrix (no need to make the matrix generation program parallel), in which you will assume the values of X0, X1, .....X127 as X0 = 1.0, X₁ = 2.0, X2 = 3.0, ..... .X127 = 128.0 (all numbers will be double precision). Then you will randomly generate the values (plus or minus) of Ai, j coefficients and calculate the value of Bis 2 For example, the last equation in the 128 by 128 matrix will be A127,127 *X127= B127 X127 128.0, so generate A 127, 127 to be 3.25 (as an example), and thus B127 will be equal to 416.0. Similarly, you can assume the value of X126 in the next above equation to be 127.0, generate random values for A126, 126, and A 126, 127 on the left side of the equation and calculate the value of B126 on the right side of the equation so that the left and rights sides of the equation are equal if we substitute the values of X 127 and X128. After generating the matrix and the vector Bi, you can partition this matrix row-blocks across processors (assume the matrix is divisible by the number of processors). Calculate the speedup for the following combinations a. Matrix size = 128 by 128, number of processors = 4 b. C. Matrix size = 256 by 256, number of processors = 4 Matrix size = 256 by 256, number of processors = 8 d. Matrix size=1024 by 1024, number of processors = 4 e. Matrix size=1024 by 1024, number of processors = 8 f. Matrix size=1024 by 1024, number of processors = 16 Submissions; Submit your homework with a Word file and text files of the C code by email to the TA Addison Clark, addison.clark@mavs.uta.edu (not to the professor). Please send all code and all documents in a WORD file. No hand written material will be accepted. Deadline, May 2, 2023. 3


Most Viewed Questions Of Distributed Computing

Ques 6 Your goal is to navigate a robot out of a maze. The robot starts in the center of the maze facing north. You can turn the robot to face north, east, south, or west. You can direct the robot to move forward a certain distance although it will stop after hitting a wall. a) Formulate this problem. This means you will have to describe initial state, goal test,successor function. and cost function. Successor function is a description of the robots successive actions after the initial state. We’ll define the coordinate system so that the center of the maze is at (0, 0), and the maze itself is a square from (-1,–1) to (1, 1)


5. (P31,In modern packet-switched networks, including the Internet, the source host segments long, application layer messages (for example, an image or a music file) into smaller packets and sends the packets into the network. The receiver then reassembles the packets back into the original message. We refer to this process as message segmentation. The figure below illustrates the end-to-end transport of a message with and without message segmentation. Consider a message that is 8 * 106 bits long that is to be sent from source to destination in the figure. Suppose each link in the figure is 2 Mbps. Ignore propagation, queuing, and processing delays. (a) Consider sending the message from source to destination without message segmentation. How long does it take to move the message from the source host to the first packet switch? Keeping in mind that each switch uses store-and-forward packet switching, what is the total time to move the message from source host to destination host? Now suppose that the message is segmented into 800 packets, with each packet being 10,000 bits long. How long does it take to move the first packet from source host to the first switch? When the first packet is being sent from the first switch to the second switch, the second packet is being sent from the source host to the first switch. At what time will the second packet be fully received at the first switch? c) How long does it take to move the file from source host to destination host when message segmentation is used? Compare this result with your answer in part (a) and comment. ) In addition to reducing delay, what are reasons to use message segmentation? e) Discuss the drawbacks of message segmentation. In addition to reducing delay, what are reasons to use message segmentation? -Discuss the drawbacks of message segmentation.


. [10 points]Suppose a process in the Host C has a UDP socket with port number 6789. Suppose both Host A and Host B each send a UDP segment to Host C with destination port number 6789. Will both of these segments be directed to the same socket at Host C? If so, how will the process at Host C know that these two segments originated from two different hosts?


A particular system is controlled by an operator through commands entered from a keyboard.The average number of commands entered in an 8-hour interval is 60. Show your work toreceive full credit. a. Suppose the CPU scans the keyboard every 100ms. How many times will the keyboard bechecked in an 8-hour period? b. By what fraction would the number of CPU visits to the keyboard be reduced if interrupt-driven 1/0 were used?


22. Many CPU-scheduling algorithms are parameterized. For example, the RR algorithm requires a parameter to indicate the time slice. Multilevel feedback queues require parameters to define the number of queues, the scheduling algorithms for each queue, the criteria used to move processes between queues, and so on. These algorithms are thus really sets of algorithms (for example, the set of RR algorithms for all time slices, and so on). One set of algorithms may include another (for example, the FCFS algorithm is the RR algorithm with an infinite time quantum). What (if any) relation holds between the following pairs of algorithm sets? a. Priority and SJF b. Multilevel feedback queues and FCFS c. Priority and FCFS


23. Suppose that a CPU scheduling algorithm favors those processes that have used the least processor time in the recent past. Why will this algorithm favor I/O-bound programs and yet notpermanently starve CPU-bound programs?


1) In a computer instruction format, the instruction length is 11 bits and the size of an address field is 4 bits. Is it possible to have: 5 two-address instructions 45 one-address instructions 32 zero-address instructions using the specified format? Justify your answer. b) Assume that a computer architect has already designed 6 two-address and 24 zero address instructions using the instruction format above. What is the maximum number of one-address instructions that can be added to the instruction set?


(1) In a computer system, memory operations currently take up 30% of execution time. A new gadget called a cache (i.e. an L1 cache) speeds-up 80% of the memory operations by a factor of 4. What is the speed-up due to the cache? ) [5] A second new gadget called an L2 cache speeds-up half the remaining20% of the memory operations by a factor of 2. What is the total speed-up with the L1 and L2 cache together?


client, and there is no other traffic on this path. Assume each packet of size L bits, and both links have the samepropagation delay dprop- (a) What is the packet inter-arrival time at the destination? That is, how much time elapses from when the lastbit of the first packet arrives until the last bit of the second packet arrives? (b) Now assume that the second link is the bottleneck link (i.e., RcjRs). Is it possible that the second packetqueues at the input queue of the second link? Explain. Now suppose that the server sends the second packetT seconds after sending the first packet. How large must T be to ensure no queuing before the second link?Explain. Consider the figure (a) below. Assume that we know the bottleneck link along the path from the server to theclient is the first link with rate Rs bits/sec. Suppose we send a pair of packets back to back from the server to the


Let a denote the rate of packets arriving at a link in packets/sec, and let µ denote the links transmission rate in packets/sec. Based on the formula for the total delay (i.e., the queuing delay plus the transmission delay) derived in the previous problem, derive a formula for the total delay in terms of a and µ. Note that the link's transmission rate in packet/sec is R/L.