Question

2 Part II: Advanced Problem 4 (9 marks, 1.5 pages). Problem Statement: You have two servers, X and Y, each containing a database of n records, where each record is quite large (m bits). Occasionally, a record may be updated on one server and not the other. To ensure both servers have identical databases, they need to synchronize their data. During synchronization, they exchange data in multiple rounds of communication. Existing Algorithm: The simplest way to synchronize the data is for X to send its entire database x to Y via the internet. Y then checks every record in x against its own database y. If a record in x is the same as the corresponding record in y, Y does nothing. If a record in x is different from the one in y, Y updates its database with the newer record or sends the newer record to X. X, upon receiving a newer record from Y, updates its own database. Challenges with Existing Algorithm: While this method works, it consumes a lot of bandwidth because it sends all n records over the network. Goal: Your task is to design a more efficient synchronization algorithm that minimizes communication complexity (sending data) and computation complexity (calculating hash values). The key is to use cryptographic hash functions, which take data as input and produce a fixed-size string (hash) as output. These hash functions are collision-resistant, meaning it's hard to find two different inputs that produce the same hash. Requirements: a) Overview (2 marks): Explain how your algorithm works, why it's a better solution, and what the communication and computation complexities are. b) Detailed Description (3 marks): Provide a detailed description of the main steps of your algorithm, specifying what X and Y do to synchronize the two databases. You can use pseudocode for this. The pseudocode format is not essential; clarity is key. c) Demonstration (2 marks): Illustrate how your algorithm works with a small example, like when n = 8. d) Analysis (2 marks): State the communication and computation complexities of your algorithm and explain why it has these complexities. The input size is (n, m). The given marks are maximum possible for the parts. Your actual marks depend on how efficient your algorithm is (regarding the communication complexity (major criteria) AND the computation complexity) and also how well it is presented. A correct and efficient but poorly written solution could still attract a low mark. An incorrect solution will get a zero mark.

Fig: 1