device qualifier may be called on the host or the
1. [] Functions annotated with the
device
2. [ ] Page faults cannot be handled by software because the overhead is too large.
3. [ ] Virtual memory space has to be bigger than the physical memory space.
4. [] You can have a miss in the TLB, a hit in the page table, and a miss in the cache for a
single memory access.
5. [ ] Shared memory in CUDA is accessible to both the host and GPU
6. [] In the case of warp divergence; all possible execution paths are run by all threads
in a warp serially so that thread instructions do not diverge.
7.
] All thread blocks involved in the same computation use the same kernel
8. [] Is it possible to multiply two 1024X1024 matrices using a tiled matrix multiplication
code with 1,024 thread blocks on a device of block size of 512 threads. Note that each
thread in a thread block calculates one element of the result matrix.
Fig: 1