Search for question
Question

2. Assume we have K different probability is p = 8(Sk (x))k= classes in a multi-class Softmax Regression model. The posterior exp ((x)) for k=1,2,..., K, where s(x) = 0.x, input x is an n- Σexp (s/(x)) dimension vector, and K the total number of classes. 1) To learn this Softmax Regression model, how many parameters we need to estimate? What are these parameters? 2) Consider the cross-entropy cost function () of m training samples ((x₁,₁))i-1.2. as below. Derive the gradient of J(0) with respect to 0- J(0) m K ΣΣy log (pk (*) m. i=1 k=1 where y(¹) = 1 if the ith instance belongs to class k; 0 otherwise.

Fig: 1