why GLM
- extend linear model to multi-classification
- generize the maximum likehood procedure to a generic distribution (exponential family distribution)
exponential family distribution
$p(y_n|\eta) := exp[y_n \eta_n - A(\eta_n)] h(y)$
bernoulli distribution, Gaussian distribution and multinomial distribution can be derived into the form.
several things keep in mind about logistic regression
how logistic regression come? what assumption is made?
- binary output: y equals to 1 or 0
- assume $y_n$ is independent of each others
- probabilistic model is applied: $P(y=1|\mathbf{x},\mathbf{beta})$ and $P(y=0|\mathbf{x},\mathbf{beta})$ they are in fact:
$$P(y{predict}=y{true}|\mathbf{x},\mathbf{beta})$$ - log-likehood is $$log(P(y{predict}=y{true}|\mathbf{x},\mathbf{beta})) = log(fun(\mathbf{\beta})) = L(\mathbf{\beta})$$
- pay attention to diffence of $p(y_n|\eta_n)$ and $p(\mathbf{y}|\mathbf{\eta})$
- it is indeed bernoulli distribution