VMM) and your favorite programming language, which demonstrates that the timings of matrix
addition differ substantially for large enough matrices, depending whether you use Version 1 or
Version 2:
for i:=1 to n do
for j:=1 to n do
C[i,j]:=A[i,j]+B[i,j]
for j:=1 to n do
for i:=1 to n do
C[i,j]:=A[i,j]+B[i,j]
Version 1
Version 2
Specifically, use this sequence of values for n, 128, 256, 512, 1024, 2048, 4096, 8192, 16384,
32768, and 65536, and study the timings of both versions. (Be aware that some runs may take
longer than you are willing, or able, to wait!) Keep in mind that theoretically the two versions
should have the same timings and the doubling the value of n should result in a quadrupling of time
spent to do the addition, assuming everything were done in core (which is of course not the case,
since the last value corresponds to a memory requirement of about 40 Gigabytes, assuming four
bytes per word). Note that you must initialize your matrices A and B but the time required for this
should not be part of the measurements. (c++ and python)
Fig: 1