Starting from:

$30

550.413 Assignment 3

550.413 Assignment 3

Instruction: This assignment consists of 4 problems.  If you cannot make it
to class, please leave the assignment under the door at Whitehead Hall 306E
and email the course instructor. If possible, please type up your assignments,
preferably using LATEX.
Problem 1: (10pts)
Let y = Xβ + ? be a linear model where X is of size n × p and the error terms
? are independent, normally distributed with mean 0 and variance σ
2
. Suppose
furthermore that the columns of X can be partitioned as
X =
W Z
where W is of size n × q and is of full-column rank and Z is of size n × (p − q)
and is of full-column rank, for some q satisfying 1 ≤ q ≤ p, and that WT Z = 0.
We now partition β as β =
h
β1
β2
i
where β1 is of size q × 1 and β2 is of size
(p − q) × 1. Let βˆ =
h
βˆ1
βˆ2
i
be the least square estimate of β.
(a) Show that βˆ
1 = (WTW)
−1WT y and βˆ
2 = (Z
T Z)
−1Z
T y.
(b) Show that βˆ
1 and βˆ
2 are independent.
(c) Let a be a q×1 vector and b be a (q−p)×1 vector. Let (l1, u1) and (l2, u2)
be the individual 95% confidence intervals for a
T β1 and b
T β2 based on
βˆ
1 and βˆ
2, respectively. Is the confidence interval (l1, u1) independent of
the confidence interval (l2, u2) ? Justify your answer.
Problem 2: (10pts)
Let W be a n×p matrix and X be a n×q matrix and that C(W) ⊆ C(X). Denote
by PX and PW the symmetric idempotent matrices projecting onto C(X) and
1
C(W), respectively. Show that PX−PW is the symmetric orthogonal projection
onto C((I − PW)X).
You can do it by arguing as follows.
• First show that PX − PW is idempotent. Hint: PXPWz = PWz for all
z; in addition PWPX = (PXPW)
.
• Next, show that for any z, (PX − PW)z ∈ C((I − PW)X). Hint: Since
C(X) and N (X) are orthogonal complements, any vector z ∈ R
n can be
written as z = Xv + w for some vectors v and some w ∈ N (X); what
is the relationship between N (W) and N (X) ?.
• Finally, show that if z ∈ C((I − PW)X) then (PX − PW)z = 0.
Problem 3: (20pts)
The kidiq.dta dataset is available from the url http://www.stat.columbia.
edu/~gelman/arm/examples/child.iq/kidiq.dta accompanying the book “Data
Analysis using Regression and Multilevel/Hierarchical Models” by Gelman and
Hill. The dataset contains observations from a sample of 434 children. The variables include the child cognitive test scores at age 3 or 4, whether the mother
finishes high school (coded as 1) or not (coded as 0), mother’s IQ, age of mother
at child’s birth, and whether the mother work or not in the first three years of
child’s life. More specifically, the variable mom.work takes on the value
• mom.work = 1 if mother did not work in first three years of child’s life
• mom.work = 2 if mother worked in second or third year of child’s life
• mom.work = 3 if mother worked part-time in first year of child’s life
• mom.work = 4 if mother worked full-time in first year of child’s life
After downloading the kidiq.dta file you can read the data into R using the
following snippet of code
library("foreign")
iq.data <- read.dta("kidiq.dta")
Using this dataset, answer the following questions.
(a) Perform a regression with kid score as the response variable and the
remaining variable except mom hs as predictor variables.
(b) Provide a quick discussion regarding the coefficients for the predictor variables. What do they say ?
(c) Using the model in part [(a)], test the hypothesis that the predictor variables mom work and mom age is associated with the response variable.
When do you recommend mothers should give birth ? What are your
assumption for making this recommendation ?
2
(d) What happens when you add mom hs as a predictor variable to the model
in part (a) ? Have your conclusion about the timing of birth changed ?
(e) Using the model in part (d), perform some diagnostics, e.g., check the
constant variance assumption, normality of errors. Look for outliers, influential points, and points with high leverage.
(f) Consider augmenting the model in part (d) with one whose predictor variables include interactions between say mom.hs and mom age or interactions
between say mom.work and mom age. Write down the “formula” for the resulting model and discuss how it differs from the “formula” for the model
in part (d). Test the hypothesis that the interaction term in the augmented
model is not significant.
Problem 4: (20pts)
The link http://www.amstat.org/publications/jse/v16n3/kuiper.xls is a
dataset collected from the Kelly Blue Book for several hundred 2005 used GM
cars. Do something with this data.
This is meant to be an open-ended question. For some ideas of the kind of analysis one can attempt, see the article http://www.amstat.org/publications/
jse/v16n3/datasets.kuiper.html.
3

More products