3. (a) Give two possible advantages of discarding irrelevant attributes before per-[4 marks]forming linear regression. (b) Write pseudocode for the algorithm of best subset selection. (c) For a given training set of size n = 100, Model A selects d = 10 attributes giving the Residual Sum of Squares RSS = 100 with the estimate ở = 3 of the standard deviation of noise, whereas Model B selects d = 6 attributes giving RSS = 200 with ở = 2. Answer the following questions showing all your calculations (if any): i. Which model is better according to C,,? ii. Which model is better according to BIC? iii. Which model is better according to adjusted R² when the Total Sum of Squares is TSS=600? (d) Describe how best model selection is modified to obtain forward stepwise selection and backward stepwise selection.[8 marks] (e) How many models need to be fitted for a regression problem with p = 6 attributes when using: i. best subset selection? ii. forward stepwise selection?