with a thread blocks of size 32 by 32. Analyze this case and what is the performance impact of
divergent warps?
Solution:
Fig: 1