run a social media named Quitter, and they have access to a lot of data inside the
3
company. As an intern in this campaign, you have the same social network
dataset (named D1) specified in the previous question ((a,b) directed pairs
indicating a follows b), but you also have an additional dataset (named D2) with
entries (a, start_time, end_time) indicating that user a was online starting
start_time and ending at end_time. The data is only for one day. All times are
hh:mm:ss. However, each user a may have multiple entries in D2 (since users log
in simultaneously). Write a Mapreduce program that extracts all pairs of users
(a,b) such that: (i) a and b follow each other, and (ii) a and b were online
simultaneously at least once during that day. Same instructions as the first
Mapreduce question in this series apply. Please ensure that a Map stage reads
data from only one input dataset (i.e., if a Map reads directly from D2, don't use
it to also read from D1. And vice-versa.) - this is good practice consistent with
good Map programming practices.
Fig: 1