Search for question
Question

Assignment 6. Write a CUDA code that parallelizes the sequential pseudo code given below so that each thread working on updating a sub-matrix of size n/p x n, where p is the total number of threads. Use multiple thread blocks and multiple threads in each block. You may assume n divisible by the total number of threads. Input: D, n x n matrix with 0 on diagonal, positive values other places Output: D /*** hbuf[n], vbuf[n]: buffers used in the code *** for k starting from 0 through n-1 for i starting from 0 through n-1 vbuf[i] = D[i] [k] end for i-loop for j starting from 0 through n−1 hbuf[j] = D[k] [j] end for j-loop for i starting from 0 through n-1 for j starting from 0 through n-1 D[i][j] end for j = min{ D[i][j], vbuf[i]+ hbuf[j] } / end for i end for k 1 p-1 1. Install CUDA on your laptop if it has Nvidia card. If no one in your group has NVidia card, merge two groups so that someone in the merged group has NVidia; or get HPCC account and get access to a queue with NVidia nodes. 2. Learn CUDA programming by finding programming tutorial online.