$29.99
AMS 274 – Generalized Linear Models
Homework 2
1. Let yi be realizations of independent random variables Yi with Poisson(µi) distributions, where
E(Yi) = µi
, for i = 1, ..., n.
(a) Obtain the expression for the deviance for comparison of the full model, which assumes a
different µi for each yi
, with a reduced model defined by a Poisson GLM with link function g(·).
That is, under the reduced model, g(µi) = ηi = x
T
i β, where β = (β1, ..., βp)
T
(with p < n) is
the vector of regression coefficients corresponding to covariates xi = (xi1, ..., xip)
T
.
(b) Show that the expression for the deviance simplifies to 2 Pn
i=1 yi
log(yi/µˆi), for the special
case of the reduced model in part (a) with g(µi) = log(µi), and linear predictor that includes
an intercept, that is, ηi = β1 +
Pp
j=2 xijβj , for i = 1, ..., n.
2. Let yi
, i = 1, ..., n, be realizations of independent random variables Yi following gamma(µi
, ν)
distributions, with densities given by
f(yi
| µi
, ν) = (ν/µi)
νy
ν−1
i
exp(−νyi/µi)
Γ(ν)
, yi > 0; ν > 0, µi > 0,
where Γ(ν) = R ∞
0
t
ν−1
exp(−t)dt is the Gamma function.
(a) Express the gamma distribution as a member of the exponential dispersion family.
(b) Obtain the scaled deviance and the deviance for the comparison of the full model, which
includes a different µi for each yi
, with a gamma GLM based on link function g(µi) = x
T
i β,
where β = (β1, ..., βp) (p < n) is the vector of regression coefficients corresponding to a set of p
covariates.
3. Consider the data set from:
http://www.stat.columbia.edu/~gelman/book/data/fabric.asc
on the incidence of faults in the manufacturing of rolls of fabric. The first column contains the
length of each roll (the covariate with values xi), and the second contains the number of faults
(the response with means µi).
(a) Use R to fit a Poisson GLM, with logarithmic link,
log(µi) = β1 + β2xi (1)
to explain the number of faults in terms of length of roll.
(b) Fit the regression model for the response means in (1) using the quasi-likelihood estimation
method, which allows for a dispersion parameter in the response variance function. (Use the
quasipoisson “family” in R.) Discuss the results.
(c) Derive point estimates and asymptotic interval estimates for the linear predictor, η0 = β1+
β2x0, at a new value x0 for length of roll, under the standard (likelihood) estimation method
from part (a), and the quasi-likelihood estimation method from part (b). Evaluate the point and
interval estimates at x0 = 500 and x0 = 995. (Under both cases, use the asymptotic bivariate
normality of (βˆ
1, βˆ
2) to obtain the asymptotic distribution of ˆη0 = βˆ
1+ βˆ
2x0.)
4. This problem deals with data collected as the number of Ceriodaphnia organisms counted in a
controlled environment in which reproduction is occurring among the organisms. The experimenter places into the containers a varying concentration of a particular component of jet fuel
that impairs reproduction. It is anticipated that as the concentration of jet fuel grows, the number of organisms should decrease. The problem also includes a categorical covariate introduced
through use of two different strains of the organism.
The data set is available from the course website
https://ams274-fall16-01.courses.soe.ucsc.edu/node/4
where the first column includes the number of organisms, the second the concentration of jet
fuel (in grams per liter), and the third the strain of the organism (with covariate values 0 and 1).
Build a Poisson GLM to study the effect of the covariates (jet fuel concentration and organism
strain) on the number of Ceriodaphnia organisms. Use graphical exploratory data analysis to
motivate possible choices for the link function and the linear predictor. Use classical measures of
goodness-of-fit and model comparison (deviance, AIC and BIC), as well as Pearson and deviance
residuals, to assess model fit and to compare different model formulations. Provide a plot of the
estimated regression functions under your proposed model.