Question

5. One of the social media billionaires is considering running for President. They run a social media named Quitter, and they have access to a lot of data inside the 3

company. As an intern in this campaign, you have the same social network dataset (named D1) specified in the previous question ((a,b) directed pairs indicating a follows b), but you also have an additional dataset (named D2) with entries (a, start_time, end_time) indicating that user a was online starting start_time and ending at end_time. The data is only for one day. All times are hh:mm:ss. However, each user a may have multiple entries in D2 (since users log in simultaneously). Write a Mapreduce program that extracts all pairs of users (a,b) such that: (i) a and b follow each other, and (ii) a and b were online simultaneously at least once during that day. Same instructions as the first Mapreduce question in this series apply. Please ensure that a Map stage reads data from only one input dataset (i.e., if a Map reads directly from D2, don't use it to also read from D1. And vice-versa.) - this is good practice consistent with good Map programming practices.

Fig: 1