Question

2. Assuming that each thread block counts 64 threads, write the host code launching the kernel (including memory allocation on the device and host-device data transfers) Solution: #include Void main(float

*A, int N) { float *A_h; float *A_d;

Fig: 1