$30
Problem 1.
Consider the training objective π½ = ||ππ€ − π‘|| subject to for some constant .
2
||π€||
2 ≤ πΆ πΆ
How would the hypothesis class capacity, overfitting/underfittting, and bias/variance vary
according to πΆ?
Larger πΆ Smaller πΆ
Model capacity (large/small?) _____ _____
Overfitting/Underfitting? __fitting __fitting
Bias variance (how/low?) __ bias / __ variance __ bias / __ variance
Note: No proof is needed
Problem 2.
Consider a one-dimensional linear regression model π‘ with a Gaussian prior
(π) ∼ π(π€π₯
(π)
, σ
Ο΅
2
)
π€ ∼ π(0, σ . Show that the posterior of is also a Gaussian distribution, i.e., π€
2
) π€
π€|π₯ . Give the formulas for .
(1)
, π‘
(1)
, ···, π₯
(π)
, π‘
(π) ∼ π(µ
πππ π‘
, σ
πππ π‘
2
) µ
πππ π‘
, σ
πππ π‘
2
Hint: Work with π(π€|π·) ∝ π(π€)π(π·|π€). Do not handle the normalizing term.
Note: If a prior has the same formula (but typically with different parameters) as the posterior, it
is known as a conjugate prior. The above conjugacy also applies to multi-dimensional Gaussian,
but the formulas for the mean vector and the covariance matrix will be more complicated.
Problem 3.
Give the prior distribution of π€ for linear regression, such that the max a posteriori estimation is
equivalent to π -penalized mean square loss.
1
Note: Such a prior is known as the Laplace distribution. Also, getting the normalization factor in
the distribution is not required.
END OF W5