Search for question
Question

Hands-on Exploration of the Income Dataset (3 pts) (watch video 1)

1. Take a look at the data. A training example (in this case, a negatively labeled one) looks like this:

37, Private, Bachelors, Separated, Other-service, White, Male, 70, England, <=50K

1

which includes the following 9 input fields plus one output field (y):

age, sector, education, marital-status, occupation, race, sex, hours-per-week, country-of-origin, target

Q: What percentage of the training data has a positive label (>50K)? (This is known as the positive %). What

about the dev set? Does it make sense given your knowledge of the average US per capita income? (0.5 pts)

Fig: 1