$29
EECS 126: Probability and Random Processes
Problem Set 6
1. The CLT Implies the WLLN
(a) Let {Xn}n∈N be a sequence of random variables. Show that if Xn
d
−→ c, where c is a
constant, then Xn
P
−→ c.
(b) Let {Xn}n∈N be a sequence of i.i.d. random variables, with mean µ and finite variance
σ
2
. Show that the CLT implies the WLLN, i.e. if
1
σ
√
n
Xn
i=1
(Xi − µ)
d−→ N (0, 1),
then
1
n
Xn
i=1
Xi
P
−→ µ.
2. Confidence Intervals: Chebyshev vs. Chernoff vs. CLT
Let X1, . . . , Xn be i.i.d. Bernoulli(q) random variables, with common mean µ = E[X1] = q
and variance σ
2 = var(X1) = q(1 − q). We want to estimate the mean µ, and towards this
goal we use the sample mean estimator
X¯
n
∆=
X1 + · · · + Xn
n
.
Given some confidence level a ∈ (0, 1) we want to construct a confidence interval around X¯
n
such that µ lies in this interval with probability at least 1 − a.
(a) Use Chebyshev’s inequality in order to show that µ lies in the interval
X¯
n −
σ
√
n
1
√
a
, X¯
n +
σ
√
n
1
√
a
with probability at least 1 − a.
(b) A Chernoff bound for this setting can be computed to be:
P(|X¯
n − q| ≥ ) ≤ 2e
−2n2
, for any > 0.
Use this inequality in order to show that µ lies in the interval
X¯
n −
1
√
2
n
r
2 ln 2
a
, X¯
n +
1
√
2
n
r
2 ln 2
a
with probability at least 1 − a.
1
(c) Show that if Z ∼ N (0, 1), then
P(|Z| ≥ ) ≤ 2e
−
2
2 , for any > 0.
(d) Use the Central Limit Theorem, and Part (c) in order to heuristically argue that µ lies
in the interval
X¯
n −
σ
√
n
r
2 ln 2
a
, X¯
n +
σ
√
n
r
2 ln 2
a
with probability at least 1 − a.
(e) Compare the three confidence intervals.
3. Transform Practice
Consider a random variable Z with transform
MZ(s) = a − 3s
s
2 − 6s + 8
, for |s| < 2.
Calculate the following quantities:
(a) The numerical value of the parameter a.
(b) E[Z].
(c) var(Z).
4. Rotationally Invariant Random Variables
Suppose random variables X and Y are i.i.d., with zero mean, such that their joint density is
rotation invariant, i.e. for any orthogonal matrix U with orthonormal rows and orthonormal
columns,
U
X
Y
d=
X
Y
(a) Let ϕ(t) be the characteristic function of X. Show that ϕ(t)
n = ϕ(
√
nt).
(b) Show that ϕ(t) = exp(ct2
) for some constant c, and all t such that t
2 ∈ Q. Hint: Let
t
2 = a/b, where a, b are positive integers.
(c) Conclude that X and Y must be Gaussians.
5. Matrix Sketching
Matrix sketching is an important technique in randomized linear algebra to do large computations efficiently. For example, to compute the multiplication AT × B of two large matrices A
and B, we can use a random sketch matrix S to compute a ”sketch” SA of A and a ”sketch”
SB of B. Such a sketching matrix has the property that S
TS ≈ I so that the approximate
multiplication ATS
TSB is close to AT B.
In this problem, we will discuss two popular sketching schemes and understand how they help
in approximate computation. Let ˆI = S
TS and the dimension of sketch matrix S be d × n
(typically d n).
2
(a) (Gaussian-sketch) Define
S =
1
√
d
S11 . . . . . . S1n
.
.
.
.
.
.
.
.
.
Sd1 . . . . . . Sdn
such that Sij ’s are chosen i.i.d. from N (0, 1) for all i ∈ [1, d] and j ∈ [1, n]. Find the
element-wise mean and variance (as a function of d) of the matrix ˆI = S
TS, that is, find
E[
ˆIij ] and Var[ˆIij ] for all i ∈ [1, n] and j ∈ [1, n].
(b) (Count-sketch) For each column j ∈ [1, n] of S, choose a row i uniformly randomly
from [1, d] such that
Sij =
(
1, with probability 0.5
−1, with probability 0.5
and assign Skj = 0 for all k 6= i. An example of a 3 × 8 count-sketch is
S =
0 −1 1 0 0 −1 0 0
1 0 0 0 −1 0 −1 0
0 0 0 1 0 0 0 −1
Again, find the element-wise mean and variance (as a function of d) of the matrix ˆI = S
TS.
Note that for sufficiently large d, the matrix ˆI is close to the identity matrix for both cases.
We will use this fact in the lab to do an approximate matrix multiplication.
Note: You can use the fact that the fourth moment of a standard Gaussian is 3 without
proof.
3