Starting from:

$30

Homework 3. MLE and James-Stein Estimator

A Mathematical Introduction to Data Science
Homework 3. MLE and James-Stein Estimator

The problem below marked by ∗
is optional with bonus credits. For the experimental problem,
include the source codes which are runnable under standard settings. Since there is NO grader
assigned for this class, homework will not be graded. But if you would like to submit your exercise,
please send your homework to the address (datascience.hw@gmail.com) with a title “CSIC5011:
Homework #”. I’ll read them and give you bonus credits.
1. Maximum Likelihood Method: consider n random samples from a multivariate normal distribution, Xi ∈ R
p ∼ N (µ, Σ) with i = 1, . . . , n.
(a) Show the log-likelihood function
ln(µ, Σ) = −
n
2
trace(Σ−1Sn) −
n
2
log det(Σ) + C,
where Sn =
1
n
Pn
i=1(Xi − µ)(Xi − µ)
T
, and some constant C does not depend on µ and
Σ;
(b) Show that f(X) = trace(AX−1
) with A, X  0 has a first-order approximation,
f(X + ∆) ≈ f(X) − trace(X−1A
0X−1∆)
hence formally df(X)/dX = −X−1AX−1
(note (I + X)
−1 ≈ I − X);
(c) Show that g(X) = log det(X) with A, X  0 has a first-order approximation,
g(X + ∆) ≈ g(X) + trace(X−1∆)
hence dg(X)/dX = X−1
(note: consider eigenvalues of X−1/2∆X−1/2
);
(d) Use these formal derivatives with respect to positive semi-definite matrix variables to
show that the maximum likelihood estimator of Σ is
ΣˆMLE
n = Sn.
A reference for (b) and (c) can be found in Convex Optimization, by Boyd and Vandenbergh,
examples in Appendix A.4.1 and A.4.3:
https://web.stanford.edu/~boyd/cvxbook/bv_cvxbook.pdf
2. Shrinkage: Suppose y ∼ N (µ, Ip).
1
Homework 3. MLE and James-Stein Estimator 2
(a) Consider the Ridge regression
min
µ
1
2
ky − µk
2
2 +
λ
2
kµk
2
2
.
Show that the solution is given by
µˆ
ridge
i =
1
1 + λ
yi
.
Compute the risk (mean square error) of this estimator. The risk of MLE is given when
C = I.
(b) Consider the LASSO problem,
min
µ
1
2
ky − µk
2
2 + λkµk1.
Show that the solution is given by Soft-Thresholding
µˆ
sof t
i = µsof t(yi
; λ) := sign(yi)(|yi
| − λ)+.
For the choice λ =

2 log p, show that the risk is bounded by
Ekµˆ
sof t(y) − µk
2 ≤ 1 + (2 log p + 1)X
p
i=1
min(µ
2
i
, 1).
Under what conditions on µ, such a risk is smaller than that of MLE? Note: see Gaussian
Estimation by Iain Johnstone, Lemma 2.9 and the reasoning before it.
(c) Consider the l0 regularization
min
µ
ky − µk
2
2 + λ
2
kµk0,
where kµk0 := Pp
i=1 I(µi 6= 0). Show that the solution is given by Hard-Thresholding
µˆ
hard
i = µhard(yi
; λ) := yiI(|yi
| > λ).
Rewriting ˆµ
hard(y) = (1 − g(y))y, is g(y) weakly differentiable? Why?
(d) Consider the James-Stein Estimator
µˆ
JS(y) = 
1 −
α
kyk
2

y.
Show that the risk is
Ekµˆ
JS(y) − µk
2 = EUα(y)
where Uα(y) = p − (2α(p − 2) − α
2
)/kyk
2
. Find the optimal α
∗ = arg minα Uα(y). Show
that for p > 2, the risk of James-Stein Estimator is smaller than that of MLE for all
µ ∈ R
p
.
Homework 3. MLE and James-Stein Estimator 3
(e) In general, an odd monotone unbounded function Θ : R → R defined by Θλ(t) with
parameter λ ≥ 0 is called shrinkage rule, if it satisfies
[shrinkage] 0 ≤ Θλ(|t|) ≤ |t|;
[odd] Θλ(−t) = −Θλ(t);
[monotone] Θλ(t) ≤ Θλ(t
0
) for t ≤ t
0
;
[unbounded] limt→∞ Θλ(t) = ∞.
Which rules above are shrinkage rules?
3. Necessary Condition for Admissibility of Linear Estimators. Consider linear estimator for
y ∼ N (µ, σ2
Ip)
µˆC(y) = Cy.
Show that ˆµC is admissible only if
(a) C is symmetric;
(b) 0 ≤ ρi(C) ≤ 1 (where ρi(C) are eigenvalues of C);
(c) ρi(C) = 1 for at most two i.
These conditions are satisfied for MLE estimator when p = 1 and p = 2.
Reference: Theorem 2.3 in Gaussian Estimation by Iain Johnstone,
http://statweb.stanford.edu/~imj/Book100611.pdf
4. *James Stein Estimator for p = 1, 2 and upper bound:
If we use SURE to calculate the risk of James Stein Estimator,
R(ˆµ
JS, µ) = EU(Y ) = p − Eµ
(p − 2)2
kY k
2
< p = R(ˆµ
MLE, µ)
it seems that for p = 1 James Stein Estimator should still have lower risk than MLE for any
µ. Can you find what will happen for p = 1 and p = 2 cases?
Moreover, can you derive the upper bound for the risk of James-Stein Estimator?
R(ˆµ
JS, µ) ≤ p −
(p − 2)2
p − 2 + kµk
2
= 2 +
(p − 2)kµk
2
p − 2 + kµk
2
.

More products