There are some ways distributed systems folks can help with elections. Someone
at the election office thinks MapReduce could be useful for “instant runoff
voting" in primaries. (Fun fact: several states, including Alaska, now use instant
runoff voting!) Here's how instant runoff voting works. Consider an election
with three candidates on the ballot - A, B, C. Every voter ranks the three
candidates as first preference, second preference, and last preference. Between
any two candidates X and Y, if a majority of voters ranked X above Y, then X
dominates Y (and vice versa)-note that this only takes into account X and Y's
relative rankings, not where they appear in the preference order, or where the
third candidate appears. A Condorcet winner is a candidate that dominates all
other candidates (pair wise) on the ballot. By definition an election can have at
most one Condorcet winner (however, there may be zero).
You are given a dataset of votes from N voters (N is odd and large, and so
dataset is sharded), where each vote V has three fields V.1, V.2, V.3, respectively
for the first, second, and third preference votes of that voter. Each line of input is
one such voter's vote V (input to initial Map function). Write a MapReduce
program that outputs either the unique single Condorcet winner among the three
candidates A, B, or C, or if there is no single Condorcet winner, then it outputs
the list of candidate(s) with the highest Condorcet count (those that dominate the
most number of candidates). For background -- in MapReduce, one writes a
program for Map that processes one input line at a time and outputs zero or
more (key, value) pairs; and one writes a program for Reduce that processes an
input of (key, all values for key). The iteration over input lines is done
automatically by the MapReduce framework. You can assume this data is
already sharded in HDFS and can be loaded from there. Each line is one vote V
Fig: 1