$29.99
Math 525 – Statistics I
Assignment 1
Problem 1: With this problem, you will improve your coding and data visualization skills.
Develop a code to visualize all components of the provided dataset. Your end result should
contain histograms and all pairwise scatter plots similar to the figure shown below.
W1
-0.5 0 0.5 1 1.5
0
20
40
60
W2
0 1 2
0
20
40
60
W3
0 1 2
W3
0
50
100
-0.5 0 0.5 1 1.5
0
1
2
W2
-0.5 0 0.5 1 1.5
W1
0
1
2
W3
0 1 2
W2
0
1
2
Associated data: In this problem, use the dataset in vis data.mat. This dataset contains a
single 2D array W whose rows gather datapoints with components spread over the columns.
Problem 2: With this problem, you will improve your simulation skills and test your understanding
of random variables.
(i) Develop a function that applies the fundamental theorem of simulation and simulates draws
1
from Categoricalσ1,··· ,σM
(πσ1
, · · · , πσM ). This function, should be able to generate variables with
any number of categories M.
(ii) Use simulations to verify that the function you developed in (i) simulates variables with the
correct statistics.
Problem 3: In this problem, you will investigate the Box-Muller algorithm.
The Box-Muller algorithm proceeds as following:
• Generate u and v by drawing uniform random variables over the interval [0, 1].
• Set x = µ + σ
√
−2 log u cos(2πv) and y = µ + σ
√
−2 log u sin(2πv).
Upon completion, the values of x and y each follow a Normal(µ, σ2
) distribution.
(i) Show mathematically that the Box-Muller algorithm indeed produces values with the correct
statistics.
(ii) Develop a function that takes as input µ, σ and implements the Box-Muller algorithm.
(iii) Use simulations to verify that the function you developed in (ii) simulates variables with the
correct statistics.
Problem 4: The following questions examine your understanding in finding MLEs.
• Let X1, X2, . . . , Xn be i.i.d with pdf f(x|θ) = θxθ−1
, 0 ≤ x ≤, 0 < θ < ∞. Find the
MLE.
• Let X and Y be independent exponential random variables, with f(x|λ) = 1
λ
exp(−x/λ), x >
0, f(y|λ) = 1
µ
exp(−y/µ), y > 0. Let Z = min(X, Y ) and W = 1 if Z = X or W = 0 if
Z = Y . Assume that (Zi
, Wi), i = 1, . . . , n are n iid observations. Find the MLEs of λ
and µ.
• A random sample, X1, X2, . . . , Xn is drawn from a Pareto population with pdf f(x|θ, ν) =
θνθ
xθ+1 1[ν,∞)(x), θ > 0, ν > 0. Find the MLEs of θ and ν.
2
Problem 5: In this problem, you will use the maximum likelihood principle to analyze experimental data.
Spectroscopic experiments, held in TTTR mode, proceed as following: Pulses of light are sent to
a chemical or biological sample. With each pulse, a molecule within the sample becomes excited.
Following a short period of time, the excited molecule emits a photon which is collected and
detected with appropriate equipment. In such an experiment, the measurements consist of the
time elapsed between a pulse and the photon detection. Each measurement w encodes the time
d that the molecule remained excited and some error r caused by the detection hardware. Both
contributions are additive which indicates that each measurement is given by the sum w = d+r.
By fundamental laws of physics, it is known that d is an exponential random variable with a rate
λ characteristic of the sample under investigation. By detector engineering, it is ensured that the
error r is normally distributed around zero with some variance υ characteristic of each detector.
(i) The dataset TTTR calibration.mat contains calibration measurements of a detector. These
measurements are obtained under artificial conditions that ensure d ≈ 0. Accordingly, they
encode only the error of the detector. Use these measurements to estimate the error variance of
the detectors. To do so, apply the maximum likelihood principle and carry out the calculations
involved analytically.
(ii) Derive an analytic expression for the probability density p(w) of each individual measurement
of an actual experiment.
(iii) Given that the detector is already calibrated from step (i), describe in detail the remaining
steps for the estimation of λ from the measurements of an actual experiment.
(iv) Implement the method you developed in (iii) and use the measurements in the provided
dataset TTTR experiment.mat to find the value of λ. To carry out the involved optimization,
you might use fminsearch or any other optimization strategy you prefer.
Associated data: The datasets TTTR calibration.mat and TTTR experiment.mat, for steps
(i) and (iv) contain measurements from the same detector. All measurements are reported in
units of ns.
3