Question

Consider the following sequence of instructions to compute x² + 4x + 1 for each element x in a vector stored in ve. multvv.d V1, ve, ve # V1 = x^2

multvs.d V2, V8, 4 # V2 = 4 x addvv.d V3, V1, V2 addvs.d V4, V3, 1 # V3 = x^2 + 4x #V4 = x^2 + 4x + 1 Assume that we have a vector processor with two vector multiplication unit whose latency is 7 and two vector addition unit whose latency is 6. Let n=32 represent the length of the vector supported on our processor. The processor has fully- pipelined vector execution units. The vector processor supports chaining. How many clock cycles will this instruction sequence take? a. 48 cycles b. 32 cycles 0.c. 52 cycles d. 120 cycles e. 51 cycles f. 100 cycles g. 200 cycles h. 150 cycles 1. 74 cycles