$30
Problem 1.
Consider the training objective ๐ฝ = ||๐๐ค − ๐ก|| subject to for some constant .
2
||๐ค||
2 ≤ ๐ถ ๐ถ
How would the hypothesis class capacity, overfitting/underfittting, and bias/variance vary
according to ๐ถ?
Larger ๐ถ Smaller ๐ถ
Model capacity (large/small?) _____ _____
Overfitting/Underfitting? __fitting __fitting
Bias variance (how/low?) __ bias / __ variance __ bias / __ variance
Note: No proof is needed
Problem 2.
Consider a one-dimensional linear regression model ๐ก with a Gaussian prior
(๐) ∼ ๐(๐ค๐ฅ
(๐)
, σ
ฯต
2
)
๐ค ∼ ๐(0, σ . Show that the posterior of is also a Gaussian distribution, i.e., ๐ค
2
) ๐ค
๐ค|๐ฅ . Give the formulas for .
(1)
, ๐ก
(1)
, ···, ๐ฅ
(๐)
, ๐ก
(๐) ∼ ๐(µ
๐๐๐ ๐ก
, σ
๐๐๐ ๐ก
2
) µ
๐๐๐ ๐ก
, σ
๐๐๐ ๐ก
2
Hint: Work with ๐(๐ค|๐ท) ∝ ๐(๐ค)๐(๐ท|๐ค). Do not handle the normalizing term.
Note: If a prior has the same formula (but typically with different parameters) as the posterior, it
is known as a conjugate prior. The above conjugacy also applies to multi-dimensional Gaussian,
but the formulas for the mean vector and the covariance matrix will be more complicated.
Problem 3.
Give the prior distribution of ๐ค for linear regression, such that the max a posteriori estimation is
equivalent to ๐ -penalized mean square loss.
1
Note: Such a prior is known as the Laplace distribution. Also, getting the normalization factor in
the distribution is not required.
END OF W5