$27
Statistical Methods for Data Science
Mini Project 3
Instructions:
• Total points = 20.
• Submit a typed report.
• Justify all steps and provide all relevant explanations.
• Do a good job.
• You must use the following template for your report:
Mini Project #
Name
Names of group members (if applicable)
Contribution of each group member
Section 1. Answers to the specific questions asked
Section 2: R code. Your code must be annotated. No points may be given if a brief
look at the code does not tell us what it is doing.
1. (8 points) Suppose we would like to estimate the parameter θ ( 0) of a Uniform (0, θ)
population based on a random sample X1, . . . , Xn from the population. In the
class, we have discussed two estimators for θ — the maximum likelihood estimator,
ˆθ1 = X(n)
, where X(n)
is the maximum of the sample, and the method of moments
estimator, ˆθ2 = 2X, where X is the sample mean. The goal of this exercise is to
compare the mean squared errors of the two estimators to determine which estimator
is better. Recall that the mean squared error of an estimator ˆθ of a parameter θ is
defined as E{(
ˆθ − θ)
2}. For the comparison, we will focus on n = 1, 2, 3, 5, 10, 30 and
θ = 1, 5, 50, 100.
1
(a) Explain how you will compute the mean squared error of an estimator using
Monte Carlo simulation.
(b) For a given combination of (n, θ), compute the mean squared errors of both ˆθ1
and ˆθ2 using Monte Carlo simulation with N = 1000 replications. Be sure to
compute both estimates from the same data.
(c) Repeat (b) for the remaining combinations of (n, θ). Summarize your results
graphically.
(d) Based on (c), which estimator is better? Does the answer depend on n or θ?
Explain. Provide justification for all your conclusions.
2. (12 points) Suppose the lifetime, in years, of an electronic component can be modeled
by a continuous random variable with probability density function
f(x) = (
θ
xθ+1 x ≥ 1,
0, x < 1,
where θ 0 is an unknown parameter. Let X1, . . . , Xn be a random sample of size
n from this population.
(a) Derive an expression for maximum likelihood estimator of θ.
(b) Suppose n = 5 and the sample values are x1 = 21.72, x2 = 14.65, x3 = 50.42, x4 =
28.78, x5 = 11.23. Use the expression in (a) to provide the maximum likelihood
estimate for θ based on these data.
(c) Even though we know the maximum likelihood estimate from (b), use the data in
(b) to obtain the estimate by numerically maximizing the log-likelihood function
using optim function in R. Do your answers match?
(d) Use the output of numerical maximization in (c) to provide an approximate
standard error of the maximum likelihood estimate and an approximate 95%
confidence interval for θ. Are these approximations going to be good? Justify
your answer.
2