$29.99
Assignment #4 STA410H1F/2102H1F
1. Consider the model
Yi = θi + εi (i = 1, · · · , n)
where {εi} is a sequence of random variables with mean 0 and finite variance representing
noise. We will assume that θ1, · · · , θn are dependent or “smooth” in the sense that the
absolute differences {|θi −θi−1|} are small for most values of i. Rather than penalizing a lack
of smoothness by P
i
(θi − 2θi−1 + θi−2)
2
(as in Assignment #2), we will consider estimating
{θi} given data {yi} by minimizing
Xn
i=1
(yi − θi)
2 + λ
Xn
i=2
|θi − θi−1| (1)
where λ > 0 is a tuning parameter and Xn
i=2
|θi − θi−1| represents the total variation of {θi}.
The resulting estimates bθ1, · · · ,
bθn, are sometimes called fusion estimates and are useful if
{θi} contain “jumps”, that is, θi = g(i/n) where g is a smooth function with a small number
of discontinuities (i.e. jumps).
The non-differentiable part of the objective function in (1) can be made separable by defining
φi = θi − θi−1 for i = 2, · · · , n and then minimizing
Xn
i=1
(yi − θi)
2 + λ
Xn
i=2
|φi
| (2)
where now each θi (for i = 2, · · · , n) will be a function of θ1, φ2, · · · , φi
. The representation of
the objective function in (2) can be used to compute the parameter estimates using coordinate
descent although there is must faster algorithm. However, (2) is useful for deriving properties
of the estimates.
(a) Show that θk = θ1 +
Pk
i=2 φi
for k ≥ 2.
(b) Show that if bθ1, · · · ,
bθn minimize (1) (or (2)) then
Xn
i=1
(yi − bθi) = 0.
(Hint: Use the representation (2) and compute its partial derivative with respect to θ1.)
1
(c) Show that |yi − bθi
| ≤ λ for all i. (Hint: Show that
∂
(
λ
Xn
i=2
|θi − θi−1|
)
⊂ [−2λ, 2λ]
n
for any θ1, · · · , θn.)
(d) For λ sufficiently large, we will have bθ1 = · · · = bθn = ¯y or equivalently bφ2 = · · · = bφn = 0.
How large must λ be in order to have bθ1 = · · · = bθn = ¯y? (Hint: Look at the sub-gradient
of (2) with respect to (θ1, φ2, · · · , φn); when is (0, 0, · · · , 0) an element of this sub-gradient at
(¯y, 0, · · · , 0)?)
Note: An R function tvsmooth is available on Quercus in a file tvsmooth.txt. You may
find it useful to simulate data from a discontinous function with additive noise and estimate
the function using tvsmooth to gain some insight into this method.
2. Suppose that X1, · · · , Xn are sampled from the following truncated Poisson distribution:
Pλ(Xi = x) = exp(−λ)λ
x
x!κλ(r)
for x = r + 1, r + 2, · · ·
for some integer r ≥ 0 where
κλ(r) = X∞
x=r+1
exp(−λ)λ
x
x!
= 1 −
Xr
x=0
exp(−λ)λ
x
x!
.
Such a sample might arise if we were sampling from a Poisson population but were unable
to observe data less than or equal to r.
The EM algorithm can be employed to estimate λ from the observed X1, · · · , Xn. The
key is to think of the observed data as a subset of some larger (“complete”) data set
X1, · · · , Xn, Xn+1, · · · , Xn+M where M ≥ 0 is a random variable and Xn+1, · · · , Xn+M ≤ r;
given M = m, this complete data set is now assumed to be m + n independent observations
from a Poisson distribution with mean λ. The log-likelihood for the complete data is
lnL(λ) = ln(λ)
nX
+m
i=1
xi − (n + m)λ,
which depends on two unknowns
nX
+m
i=n+1
xi and m. To use the EM algorithm, we need to
estimate these two unknowns.
(a) The probability distribution of M is
Pλ(M = m) =
n + m − 1
m
!
(1 − κλ(r))mκλ(r)
n
for m = 0, 1, 2, · · ·
2
Show that Eλ(M) = n(1 − κλ(r))/κλ(r).
(b) Show that
Eλ
nX
+M
i=n+1
Xi
X1 = x1, · · · , Xn = xn
= Eλ(M)Eλ(Xi
|Xi ≤ r).
(Hint: Note that (a) Xn+1, · · · , Xn+M are independent of X1, · · · , Xn and (b) Xn+1, · · · , Xn+M ≤
r.)
(c) Consider the data given in the table below. They represent the accident claims submitted
to La Royale Belge Insurance Company during a single year. A crude model for the number
of claims submitted for a given policy is Poisson. However, the data below does not provide
the number of policies for which no claims were submitted. We want to estimate λ as well
as to impute (estimate) the number M of policies with no claims.
Number of claims 1 2 3 4 5 6 7
Number of policies 1317 239 42 14 4 4 1
Assume a truncated Poisson model for these data taking r = 0 and estimate λ as well as M
using the EM algorithm (which in this case has a particularly simple form). Do you think the
truncated Poisson model is useful for these data? (For example, do you think your estimate
of M is reasonable?)
3