SWIM/ping-based failure detection protocol, for an asynchronous distributed
system, that works as follows. Assume there are N=M*K*R processes in the
system (M, K, R, are positive integers, each > 2). Arrange these N processes in a
MxKxR 3-dimensional matrix (tesseract), with M processes in each column, and
K processes in each row, and R processes in the 3rd dimension (aisles). All
processes maintain a full membership list, however pinging is partial. Each
process Pijk (in i-th row and j-th column and k-th aisle) periodically (every T time
units) marks a subset of its membership list as its Monitoring Set. The monitoring
set of a given process, once selected, does not change. The monitoring set of Pijk
contains: i) all the processes in in its own column Pjk, ii) all the other processes in
its own row Prk, and ii) all the processes in in its own aisle Pij. At this point,
there are two options available to you: Option 1 - Each process sends heartbeats
to its monitoring set members. Option 2 - Each process periodically pings all its
monitoring set members; pings are responded to by acks, just like in the SWIM
protocol (but there are no indirect pings or indirect acks.). Failure detection
timeouts work as usual: Option 1 has the heartbeat receiver timeout waiting for a
heartbeat, while Option 2 has the pinging process (pinger) time out. The
suspected process is immediately marked as failed. This is run in an
asynchronous distributed system.
a. How many failures does Option 1 take to violate completeness? That is,
find the value L so that if there are (L-1) simultaneous failures, all of them
will be detected, but if there are L simultaneous failures then not all of
them may be detected.
b. Answer the same above question for Option 2.
c. An opposition party candidate claims that for K-R=2, both Option 1 and
Option 2 provide completeness for all scenarios with up to (and including)
9 simultaneous failures. You gently respond that they are wrong and that
it also depends on M. What are all the values of M (given K=R-2) for
which your opponent's claim above is true? Justify your answer clearly.
Fig: 1