$30
EECS 126: Probability and Random Processes
Problem Set 11
1. Midterm
Solve all of the problems on the midterm again (including the ones you got correct).
2. Compression of a Random Source
Suppose I’m trying to send a text message to my friend. In general, I know I need log2
(26)
bits for every letter I want to send because there are 26 letters in the alphabet. However, it
turns out if I have some information on the distribution of the letters, I can do better. For
example, I might give the letter e a shorter bit representation because I know it’s the most
common. Actually, it turns out the number of bits I need on average is the entropy, and in
this problem, we try to show why this is true in general.
Let (Xi)∞
i=1
i.i.d. ∼ p(·), where p is a discrete PMF on a finite set X . We know the entropy of a
random variable X is
H(X) = −
X
x∈X
p(x) log2 p(x)
Since entropy is really a function of the distribution, we could write the entropy as H(p).
(a) Show that
−
1
n
log2 p(X1, . . . , Xn)
n→∞ −−−→ H(X1) almost surely.
(Here, we are extending the notation p(·) to denote the joint PMF of (X1, . . . , Xn):
p(x1, . . . , xn) := p(x1)· · · p(xn).)
(b) Fix > 0 and define A
(n)
as the set of all sequences (x1, . . . , xn) ∈ X n
such that:
2
−n(H(X1)+) ≤ p(x1, . . . , xn) ≤ 2
−n(H(X1)−)
.
Show that P((X1, . . . , Xn) ∈ A
(n)
) > 1 − for all n sufficiently large. Consequently,
A
(n)
is called the typical set because the observed sequences lie within A
(n)
with high
probability.
(c) Show that (1 − )2n(H(X1)−) ≤ |A
(n)
| ≤ 2
n(H(X1)+)
, for n sufficiently large. Use the
union bound.
Parts (b) and (c) are called the asymptotic equipartition property (AEP) because
they say that there are ≈ 2
nH(X1) observed sequences which each have probability
≈ 2
−nH(X1)
. Thus, by discarding the sequences outside of A
(n)
, we need only keep track
of 2nH(X1)
sequences, which means that a length-n sequence can be compressed into
≈ nH(X1) bits, requiring H(X1) bits per symbol.
1
(d) Now show that for any δ > 0 and any positive integer n, if Bn ⊆ X n
is a set with
|Bn| ≤ 2
n(H(X1)−δ)
, then P((X1, . . . , Xn) ∈ Bn) → 0 as n → ∞.
This says that we cannot compress the observed sequences of length n into any set smaller
than size 2nH(X1)
.
[Hint: Consider the intersection of Bn and A
(n)
.]
(e) Next we turn towards using the AEP for compression. Recall that in order to encode
a set of size n in binary, it requires dlog2 ne bits. Therefore, a na¨ıve encoding requires
dlog2
|X |e bits per symbol.
From (b) and (d), if we use log2
|A
(n)
| ≈ nH(X1) bits to encode the sequences in A
(n)
,
ignoring all other sequences, then the probability of error with this encoding will tend
to 0 as n → ∞, and thus an asymptotically error-free encoding can be achieved using
H(X1) bits per symbol.
Alternatively, we can create an error-free code by using 1 + dlog2
|A
(n)
|e bits to encode
the sequences in A
(n)
and 1 + ndlog2
|X |e bits to encode other sequences, where the first
bit is used to indicate whether the sequence belongs in A
(n)
or not. Let Ln be the length
of the encoding of X1, . . . , Xn using this code; show that limn→∞ E[Ln]/n ≤ H(X1) + .
In other words, asymptotically, we can compress the sequence so that the number of bits
per symbol is arbitrary close to the entropy.
2