Starting from:

$29.99

Assignment #1 STA355H1S

Assignment #1 STA355H1S
Instructions: Solutions to problems 1 and 2 are to be submitted on Quercus (PDF files
only). You are strongly encouraged to do problems 3 through 7 but these are not to be
submitted for grading.
1. Suppose that Y = (Y1, · · · , Yn)
T where Y1, · · · , Yn are independent Normal random variables where Yi ∼ N (µi
, σ2
). If Γ is an n × n orthogonal matrix (that is, Γ−1 = ΓT
) then
Z = ΓY is a random vector whose elements Z1, · · · , Zn are independent Normal random
variables each with variance σ
2 whose means ν = (ν1, · · · , νn)
T are defined by ν = Γµ. It
is often convenient to assume that the mean vector ν is “sparse” in the sense that all but a
small fraction of its components are exactly 0. (In practice, the matrix Γ is chosen so that
the sparsity of ν = Γµ is a reasonable assumption.)
Half-normal plots (which are often called Daniel plots) are used in some statistical models
to distinguish values of Z1, · · · , Zn coming from a N (0, σ2
) distribution from those coming
from Normal distributions with non-zero means. Suppose for example that νi1
, · · · , νik
are
non-zero with the remaining components equal to 0; then we would expect the values of
|Zi1
|, · · · , |Zik
| to be larger than other values of {|Zi
|}. Defining Wi = |Zi
|, we plot the
ordered values W(1) ≤ · · · ≤ W(n) versus the corresponding quantiles of a standard “halfnormal” distribution (the distribution of the absolute value of a N (0, 1) random variable);
if Z1, · · · , Zn come from a N (0, σ2
) distribution then the points should lie close to a straight
line whose slope is σ; on the other hand, if νi1
, · · · , νik
are non-zero then we might expect the
largest values W(n−k+1), · · · , W(n) to lie noticeably above the line whose slope is σ. However,
since σ is unknown, we need to estimate it and we do not want this estimate influenced (that
is, biased upwards) by larger values of Wi
; in part (b) below, we define possible “robust”
estimators of σ.
(a) If Z ∼ N (0, σ2
), show that
(i) the cdf of |Z| is G(x) = 2Φ(x/σ)−1 where Φ(t) is the cdf of a N (0, 1) random variable;
(ii) the τ quantile of the distribution of |Z| is G−1
(τ ) = σΦ
−1
((τ + 1)/2).
(b) Suppose that Z1, · · · , Zn are independent N (0, σ2
) random variables and define Wi = |Zi
|
for i = 1, · · · , n and the order statistics W(1) ≤ W(2) ≤ · · · ≤ W(n)
. The result of part (a)
suggests that we could estimate σ using an order statistic W(k) as follows:
σbk =
W(k)
Φ−1
((τk + 1)/2)
where (for example) τk = k/(n + 1). If τk → τ ∈ (0, 1) as k, n → ∞ then

n(σbk − σ)
d −→ N (0, γ2
(τ )).
Give an expression for γ
2
(τ ). For what value of τ is γ
2
(τ ) minimized? (You can determine
the minimizing value of τ graphically.)
(c) A random variable U is said to be stochastically greater than a random variable V if
P(U ≤ x) ≤ P(V ≤ x) for all x with P(U ≤ x) < P(V ≤ x) for some x. (This definition
seems strange but note that it implies that values of V will tend to be less than values of
U.) Suppose that U ∼ N (µ1, σ2
) and V ∼ N (µ2, σ2
) where |µ1| > |µ2|. Show that |U| is
stochastically greater than |V |. (Hint: First of all, show that the distribution of |U| depends
on |µ1| so that we can assume that µ1 > µ2 ≥ 0. Then show that if X ∼ N (µ, σ2
) for µ ≥ 0
then P(|X| ≤ x) decreases as µ increases. Calculus is your friend here!)
(d) The function halfnormal.txt on Quercus contains a function to do half-normal plots.
This function halfnormal has three arguments: the data x, the value of τ , tau (which
defaults to τ = 0.5) used to estimate σ, and an optional parameter ylim, which allows
you to define the minimum and maximum y-axis values. The file data.txt contains 1000
observations from Normal distributions whose means are almost all 0. Using half-normal
plots, try to estimate how many of the 1000 means are non-zero. There is no right or wrong
approach here so feel free to be creative.
2. The hazard or failure rate function of a non-negative continuous random variable X is
defined to be
h(x) = f(x)
1 − F(x)
for x ≥ 0
where f(x) is the pdf of X and F(x) is its cdf. We can also define h(x) by
h(x) = limδ↓0
1
δ
P(x ≤ X ≤ x + δ|X ≥ x).
(a) A useful formula for the expected value of any non-negative random variable is
E(X) = Z ∞
0
(1 − F(x)) dx.
If X is also continuous with pdf f(x) then this formula can be derived as follows:
E(X) = Z ∞
0
xf(x) dx
=
Z ∞
0
Z x
0
f(x) dt dx
=
Z ∞
0
Z ∞
t
f(x) dx dt
=
Z ∞
0
(1 − F(t)) dt.
If h(x) is the hazard function of X, show that
E(X) = Z 1
0
1
h(F −1
(τ )) dτ.
(Hint: Make the change of variables u = F
−1
(τ ).)
(b) Suppose that X(k)
is the k-th order statistic where k ≈ τn (for some τ ∈ (0, 1)) and define
Dk = X(k) − X(k−1). From lecture, we know that the distribution of n Dk is approximately
Exponential with mean 1/f(F
−1
(τ )). Use this fact to show that the distribution of (n − k +
1)Dk is approximately Exponential with mean 1/h(F
−1
(τ )). (Hint: Note that h(F
−1
(τ )) =
f(F
−1
(τ ))/(1 − τ ).)
(c) The shape of h(x) provides useful information about the distribution not readily obvious
from the pdf and cdf; for example, if X represents the lifetime of some (say) electronic
component then a decreasing hazard function would indicate that the component improves
with age.
The total time on test (TTT) plot provides one to assess the rough shape of h(x) based
on a sample x1, · · · , xn. To construct this plot, we define
d1 = nx(1)
dk = (n − k + 1)(x(k) − x(k−1)) for k = 2, · · · , n
and plot (d1 + · · · + dk)/(x1 + · · · + xn) versus k/n for k = 1, · · · , n. Using the result from
part (b), we might argue that (d1 + · · · + dk)/(x1 + · · · + xn) is an estimate of
1
E(X)
Z τ
0
1
h(F −1
(τ )) dτ
for τ = k/n. If the underlying hazard function h(x) is decreasing then the shape of these
points will be roughly convex (and lie below the 45o
line) while if h(x) is increasing then the
shape of the points will be roughly concave (and lie above the 45o
line).
Given data in a vector x, the TTT plot can be constructed as follows:
> x <- sort(x) # order elements from smallest to largest
> n <- length(x) # find length of x
> d <- c(n:1)*c(x[1],diff(x))
> plot(c(1:n)/n, cumsum(d)/sum(x), xlab="t", ylab="TTT")
> abline(0,1) # add 45 degree line to plot
Data on the lifetimes (in hours) of Kevlar 373/epoxy strands (subjected to constant pressure
at 90% stress level) are contained in the file kevlar.txt. Construct a TTT plot for these
data. Does the hazard function appear to be increasing or decreasing with time?
Supplemental problems (not to be handed in):
3. (a) Suppose that X has a Gamma distribution with shape parameter α and scale parameter λ; the density of X is
f(x) = λ
αx
α−1
exp(−λx)
Γ(α)
for x > 0
Find expressions for the skewness and kurtosis of X in terms of α and λ. (Do these depend
on λ?) What happens to the skewness and kurtosis as α → ∞?
(b) Suppose that X1, · · · , Xn are independent and define Sn = X1 +· · ·+Xn. Assuming that
E(X3
i
) is well-defined for all i, show that the skewness of Sn is given by
skew(Sn) = Xn
i=1
σ
2
i
!−3/2
Xn
i=1
σ
3
i
skew(Xi)
where σ
2
i = Var(Xi). (Hint: Follow the proof given for the kurtosis identity assuming for
simplificity that E(Xi) = 0; this is more simple since E(Sn) involves a triple summation,
most of whose terms are 0.)
4. Suppose that X1, · · · , Xn are independent random variables with distribution function F
where µ = E(Xi) and σ
2 = Var(Xi). For some families of distributions, the variance is a
function of the mean so that σ
2 = σ
2
(µ). A function g is said to be a variance stabilizing
transformation for the family of distributions if

n(g(X¯
n) − g(µ)) d −→ N (0, 1)
(a) Show that g defined above must satisfy the differential equation
g
0
(µ) = ±
1
σ(µ)
.
(Note that g is not unique.)
(b) Find variance stabilizing transformations for
(i) Poisson distributions;
(ii) Exponential distributions;
(iii) Bernoulli distributions.
5. Suppose that X1, · · · , Xn are independent random variables with some continuous distribution function F. Given data x1, · · · , xn (outcomes of X1, · · · , Xn), we can make a boxplot
to graphically represent the data — observations beyond the “whiskers” (which extend to
at most 1.5 × interquartile range from the upper and lower quartiles) are flagged as possible
outliers. When n is large enough, we can obtain a crude estimate for the expected number
of outliers as follows:
(i) Compute the lower and upper quartiles of F, F
−1
(1/4) and F
−1
(3/4) and define IQR =
F
−1
(3/4) − F
−1
(1/4).
(ii) Compute the probability of an outlier by
F(F
−1
(1/4) − 1.5 × IQR) + 1 − F(F
−1
(3/4) + 1.5 × IQR)
(iii) The expected number of outliers is simply n times the probability in part (ii).
Compute the expected number of outliers for the following distributions.
(a) Normal distribution – note that the probability in (ii) will not depend on the mean and
variance so you can assume a standard normal distribution. (The R functions pnorm and
qnorm can be used to compute the distribution function and quantiles, respectively, for the
normal distribution.)
(b) Laplace distribution with density
f(x) = 1
2
exp(−|x|).
(No R functions for the distribution functions and quantiles seem to exist for the Laplace
distribution. However, both are easy to evaluate analytically.)
(c) Cauchy distribution with density
f(x) = 1
π(1 + x
2
)
(The R functions pcauchy and qcauchy can be used to compute the distribution function
and quantiles, respectively, for the Cauchy distribution.)
(d) Comment on the differences between the 3 distributions considered in parts (a)–(c). In
particular, how does the proportion of outliers change as the “tails” (i.e. the rate at which
f(x) goes to 0 as |x| → ∞) of the distributions change?
6. Suppose that X1, X2, · · · is a sequence of independent random variables with mean µ and
variance σ
2 < ∞; define X¯
n = n
−1
(X1 + · · · + Xn). Describe the limiting behaviour (that
is, either convergence in probability or convergence in distribution as well as the limit as
n → ∞) of the following random variables.
(a) S
2
n =
1
n − 1
Xn
i=1
(Xi − X¯
n)
2
.
(b) √
n(X¯
n − µ)/Sn.
(c) √
n(exp(X¯
n) − exp(µ))/Sn.
(d) 1
n
Xn
i=1
|Xi − X¯
n|. (The limit here should be intuitively clear; however, proving it is not
easy!)
7. Suppose that an(Xn−θ)
d
−→ Z (where an ↑ ∞) and that g(x) is an infinitely differentiable
function (that is, it has derivatives of all orders). The Delta Method says that
an(g(Xn) − g(θ)) d
−→ g
0
(θ)Z;
if g
0
(θ) = 0 then the right hand side above is 0 and so an(g(Xn) − g(θ)) p
−→ 0.
(a) Suppose that g
0
(θ) = 0 and g
00(θ) 6= 0. Use the Taylor series expansion
g(x) = g(θ) + g
0
(θ)(x − θ) + 1
2
g
00(θ)(x − θ)
2 + rn
(where rn/(x − θ)
2 → 0 as x → θ) to find the limiting distribution of a
2
n
(g(Xn) − g(θ)).
(b) Extend the result of part (a) to the case where g
0
(θ) = g
00(θ) = · · · = g
(k−1)(θ) = 0 but
g
(k)
(θ) 6= 0 (g
(k) denotes the k-th derivative of g).

More products