Problem 4 (9 marks, 1.5 pages). Problem Statement: You have two servers, X and Y,
each containing a database of n records, where each record is quite large (m bits).
Occasionally, a record may be updated on one server and not the other. To ensure both
servers have identical databases, they need to synchronize their data. During
synchronization, they exchange data in multiple rounds of communication.
Existing Algorithm: The simplest way to synchronize the data is for X to send its
entire database x to Y via the internet. Y then checks every record in x against its own
database y. If a record in x is the same as the corresponding record in y, Y does nothing.
If a record in x is different from the one in y, Y updates its database with the newer
record or sends the newer record to X. X, upon receiving a newer record from Y, updates
its own database.
Challenges with Existing Algorithm: While this method works, it consumes a lot
of bandwidth because it sends all n records over the network.
Goal: Your task is to design a more efficient synchronization algorithm that
minimizes communication complexity (sending data) and computation complexity
(calculating hash values). The key is to use cryptographic hash functions, which take
data as input and produce a fixed-size string (hash) as output. These hash functions are
collision-resistant, meaning it's hard to find two different inputs that produce the same
hash.
Requirements:
a) Overview (2 marks): Explain how your algorithm works, why it's a better
solution, and what the communication and computation complexities are.
b) Detailed Description (3 marks): Provide a detailed description of the main
steps of your algorithm, specifying what X and Y do to synchronize the two databases.
You can use pseudocode for this. The pseudocode format is not essential; clarity is key.
c) Demonstration (2 marks): Illustrate how your algorithm works with a small
example, like when n = 8.
d) Analysis (2 marks): State the communication and computation complexities of
your algorithm and explain why it has these complexities. The input size is (n, m).
The given marks are maximum possible for the parts. Your actual marks depend on
how efficient your algorithm is (regarding the communication complexity (major
criteria) AND the computation complexity) and also how well it is presented. A correct
and efficient but poorly written solution could still attract a low mark. An incorrect
solution will get a zero mark.
Fig: 1