Question

8) For a vector addition, assume that the vector length is 8000, each thread calculates one output element, and the thread block size is 1024 threads. The programmer configures the kernel

launch to have a minimal number of thread blocks to cover all output elements. How many threads will be in the grid? (a) 8000 (b) 8196 (c) 8192 (d) 8200

Fig: 1