H=\bigcup_{i=1}^{k} H_{i} Consider two alternative approaches: 1. Learn on the sample using the ERM rule. 2. Divide that sample into a training set of size (1 – a)m, and a validation set of size amfor some a e (0,1). Then apply the approach of model selection using validation, i.e.: • First, train each class H¡ on the (1 – a)m training examples using the ERM rulew.r.t. H; and let h1,..,h be all the resulting hypotheses. • Second, apply the ERM rule w.r.t. to the finite class {h1,..,hg} on the am vali-dation examples. Under which conditions is the second approach better? Justify your answerformally (using maximum 2 pages). Let H1, … · ,Hk be hypothesis classes such that H1 C H2 C … c Hk and |H1 = 2', for every i E {1,..,k}. Suppose you are given a sample of size m (with each element chosen, and you want to learn the union of all these classes, that is you would like to learn the hypothesis class

Fig: 1

Fig: 2

Fig: 3

Fig: 4

Fig: 5

Fig: 6

Fig: 7

Fig: 8