$29.99
AMS 274 – Generalized Linear Models
Homework 1
1. The list below comprises a number of distributions, including in each case, the support,
parameter space, and density or probability mass function. Determine whether each of
the distributions belongs to the exponential dispersion family. Similarly for the twoparameter exponential family of distributions. In both cases, justify your answers.
(a) Double exponential (or Laplace) distribution.
f(y | θ, σ) = 1
2σ
exp
−
|y − θ|
σ
y ∈ R, θ ∈ R, σ > 0.
(b) Uniform distribution.
f(y | θ, σ) = 1
2σ
θ − σ < y < θ + σ, θ ∈ R, σ > 0.
(c) Logistic distribution.
f(y | θ, σ) = exp((y − θ)/σ)
σ {1 + exp((y − θ)/σ)}
2
y ∈ R, θ ∈ R, σ > 0.
(d) Cauchy distribution.
f(y | θ, σ) = 1
πσ {1 + ((y − θ)/σ)
2}
y ∈ R, θ ∈ R, σ > 0.
(e) Pareto distribution.
f(y | α, β) = βαβ
y
β+1 y ≥ α, α > 0, β > 0.
(f) Beta distribution.
f(y | α, β) = y
α−1
(1 − y)
β−1
B(α, β)
0 ≤ y ≤ 1, α > 0, β > 0,
where B(α, β) = R 1
0
u
α−1
(1 − u)
β−1 du is the beta function.
(g) Negative binomial distribution.
f(y | α, p) = Γ(y + α)
Γ(α)y!
p
α
(1 − p)
y
y ∈ {0, 1, 2, ...}, α > 0, 0 < p < 1,
where Γ(c) = R ∞
0
u
c−1
exp(−u) du is the gamma function.
2. Consider the linear regression setting where the responses Yi
, i = 1, ..., n, are assumed
independent with means µi = E(Yi) = x
T
i β =
Pp
j=1 xijβj
for (known) covariates xij and
(unknown) regression coefficients β = (β1, ..., βp)
T
.
(i) Show that if the response distribution is normal,
Yi
ind. ∼ f(yi
| µi
, σ) = (2πσ2
)
−1/2
exp
−
(yi − µi)
2
2σ
2
, i = 1, ..., n,
then the maximum likelihood estimate (MLE) of β is obtained by minimizing the L2-
norm,
S2(β) = Xn
i=1
(yi − x
T
i β)
2
.
(ii) Show that if the response distribution is double exponential,
Yi
ind. ∼ f(yi
| µi
, σ) = (2σ)
−1
exp
−
|yi − µi
|
σ
, i = 1, ..., n,
then the MLE of β is obtained by minimizing the L1-norm,
S1(β) = Xn
i=1
|yi − x
T
i β|.
(iii) Show that if the response distribution is uniform over the range [µi − σ, µi + σ],
Yi
ind. ∼ f(yi
| µi
, σ) = (2σ)
−1
, for µi − σ ≤ yi ≤ µi + σ, i = 1, ..., n,
then the MLE of β is obtained by minimizing the L∞-norm,
S∞(β) = max
i
|yi − x
T
i β|.
(iv) Obtain the MLE of σ under each one of the response distributions in (i) – (iii) and
show that, in all cases, it is a function of the minimized norm.
3. Consider the special case of the Cauchy distribution, C(θ, 1), with scale parameter σ = 1,
and density function
f(y | θ) = 1
π{1 + (y − θ)
2}
y ∈ R, θ ∈ R,
where θ is the median of the distribution.
(a) Let y = (y1, ..., yn) be a random sample from the C(θ, 1) distribution. Develop the
Newton-Raphson method and the method of scoring to approximate the maximum likelihood estimate of θ based on the sample y. (For the method of scoring, you can use the
result R ∞
0
(1 − x
2
)/{(1 + x
2
)
3} dx = π/8.)
(b) Consider a sample, assumed to arise from the C(θ, 1) distribution, with n = 9 and
y = (−0.774, 0.597, 7.575, 0.397, −0.865, −0.318, −0.125, 0.961, 1.039). Apply both
methods from (a) to estimate θ. To check your results, try a few different starting values
and also plot the likelihood function for θ.
(c) Now consider a sample (again, assumed to arise from the C(θ, 1) distribution) with
n = 3 and y = (0, 5, 9). Apply again the methods from (a) to estimate θ, using three
different starting values, θ
0 = −1, θ
0 = 4.67, θ
0 = 10. Comment on the results.
4. The data in the table below show the number of cases of AIDS in Australia by date of
diagnosis for successive 3-months periods from 1984 to 1988.
Quarter
Year 1 2 3 4
1984 1 6 16 23
1985 27 39 31 30
1986 43 51 63 70
1987 88 97 91 104
1988 110 113 149 159
Let xi = log(i), where i denotes the time period for i = 1, ..., 20. Consider a GLM for this
data set based on a Poisson response distribution with mean µ, systematic component β1
+ β2xi
, and logarithmic link function g(µ) = log(µ).
(a) Fit this GLM to the data working from first principles, that is, derive the expressions that are needed for the scoring method, and implement the algorithm to obtain the
maximum likelihood estimates for β1 and β2.
(b) Use function “glm” in R to verify your results from part (a).