1. Take a look at the data. A training example (in this case, a negatively labeled one) looks like this:
37, Private, Bachelors, Separated, Other-service, White, Male, 70, England, <=50K
1
which includes the following 9 input fields plus one output field (y):
age, sector, education, marital-status, occupation, race, sex, hours-per-week, country-of-origin, target
Q: What percentage of the training data has a positive label (>50K)? (This is known as the positive %). What
about the dev set? Does it make sense given your knowledge of the average US per capita income? (0.5 pts)
Fig: 1