$29
CSE 250a. Assignment 3
3.1 Inference in a chain
X1 X2 X3 XT-2 XT-1 XT • • •
Consider the belief network shown above with random variables Xt∈ {1, 2, . . . , m}. Suppose that the CPT
at each non-root node is given by the same m×m matrix; that is, for all t ≥ 1, we have:
Aij = P(Xt+1 =j|Xt =i).
(a) Prove that P(Xt+1 =j|X1 =i) = [At
]ij , where At
is the t
th power of the matrix A.
Hint: use induction.
(b) Consider the computational complexity of this inference. Devise a simple algorithm, based on matrixvector multiplication, that scales as O(m2
t).
(c) Show alternatively that the inference can also be done in O(m3
log2
t).
(d) Suppose that the transition matrix Aij is sparse, with at most s?m non-zero elements per row. Show
that in this case the inference can be done in O(smt).
(e) Show how to compute the posterior probability P(X1 =i|XT +1 =j) in terms of the matrix A and the
prior probability P(X1 =i). Hint: use Bayes rule and your answer from part (a).
3.2 More inference in a chain
Y1
X0 X1
Consider the simple belief network shown to the right, with nodes X0, X1, and Y1.
To compute the posterior probability P(X1|Y1), we can use Bayes rule:
P(X1|Y1) = P(Y1|X1) P(X1)
P(Y1)
(a) Show how to compute the conditional probability P(Y1|X1) that appears in the numerator of Bayes
rule from the CPTs of the belief network.
(b) Show how to compute the marginal probability P(Y1) that appears in the denominator of Bayes rule
from the CPTs of the belief network.
Next you will show how to generalize these simple computations when the basic structure of this DAG is
repeated to form an extended chain. Like the previous problem, this is another instance of efficient inference
in polytrees.
1
Y1
X0 X1
Y2
X2 Xn-1
Yn
Xn
. . .
. . .
Yn-1
Consider how to efficiently compute the posterior probability P(Xn|Y1, Y2, . . . , Yn) in the above belief
network. One approach is to derive a recursion from the conditionalized form of Bayes rule
P(Xn|Y1, Y2, . . . , Yn) = P(Yn|Xn, Y1, Y2, . . . , Yn−1) P(Xn|Y1, Y2, . . . , Yn−1)
P(Yn|Y1, . . . , Yn−1)
where the nodes Y1, Y2, . . . , Yn−1 are treated as background evidence. In this problem you will express the
conditional probabilities on the right hand side of this equation in terms of the CPTs of the network and the
probabilities P(Xn−1 =x|Y1, Y2, . . . , Yn−1), which you may assume have been computed at a previous step
of the recursion. Your answers to (a) and (b) should be helpful here.
(c) Simplify the term P(Xn|Y1, Y2, . . . , Yn−1) that appears in the numerator of Bayes rule.
(d) Show how to compute the conditional probability P(Yn|Xn, Y1, Y2, . . . , Yn−1) that appears in the
numerator of Bayes rule. Express your answer in terms of the CPTs of the belief network and the
probabilities P(Xn−1 =x|Y1, Y2, . . . , Yn−1), which you may assume have already been computed.
(e) Show how to compute the conditional probability P(Yn|Y1, Y2, . . . , Yn−1) that appears in the denominator of Bayes rule. Express your answer in terms of the CPTs of the belief network and the
probabilities P(Xn−1 =x|Y1, Y2, . . . , Yn−1), which you may assume have already been computed.
3.3 Node clustering and polytrees
In the figure below, circle the DAGs that are polytrees. In the other DAGs, shade two nodes that could be
clustered so that the resulting DAG is a polytree.
2
3.4 Cutsets and polytrees
Clearly not all problems of inference are intractable in loopy belief networks. As a trivial example, consider
the query P(Q|E1, E2) in the network shown below:
Q
E2
E1
In this case, because E1 and E2 are the parents of Q, the query P(Q|E1, E2) can be answered directly from
the conditional probability table at node Q.
As a less trivial example, consider how to compute the posterior probability P(Q|E1, E2) in the belief network shown below:
E2
Q
E1
In this belief network, the posterior probability P(Q|E1, E2) can be correctly computed by running the
polytree algorithm on the subgraph of nodes that are enclosed by the dotted rectangle:
E2
Q
E1
In this example, the evidence nodes d-separate the query node from the loopy parts of the network. Thus for
this inference the polytree algorithm would terminate before encountering any loops.
(continued)
3
For each of the five loopy belief networks shown below, consider how to compute the posterior probability
P(Q|E1, E2).
If the inference can be performed by running the polytree algorithm on a subgraph, enclose this subgraph
by a dotted line as shown on the previous page. (The subgraph should be a polytree.)
On the other hand, if the inference cannot be performed in this way, shade one node in the belief network
that can be instantiated to induce a polytree by the method of cutset conditioning.
E2
Q
E1
E2
Q
E1
E1 Q E2
E1 Q E2
E1 Q E2
4
3.5 Node clustering
Consider the belief network shown below over binary variables X, Y1, Y2, Y3, Z1, and Z2. The network can
be transformed into a polytree by clustering the nodes Y1, Y2, and Y3 into a single node Y . From the CPTs
in the original belief network, fill in the missing elements of the CPTs for the polytree.
X P(Y1 = 1|X) P(Y2 = 1|X) P(Y3 = 1|X)
0 0.75 0.50 0.25
1 0.50 0.25 0.75
Y1 Y2 Y3 P(Z1 = 1|Y1, Y2, Y3) P(Z2 = 1|Y1, Y2, Y3)
0 0 0 0.9 0.1
1 0 0 0.8 0.2
0 1 0 0.7 0.3
0 0 1 0.6 0.4
1 1 0 0.5 0.5
1 0 1 0.4 0.6
0 1 1 0.3 0.7
1 1 1 0.2 0.8
Y1 Y2 Y3 Y P(Y |X = 0) P(Y |X = 1) P(Z1 = 1|Y ) P(Z2 = 1|Y )
0 0 0 1
1 0 0 2
0 1 0 3
0 0 1 4
1 1 0 5
1 0 1 6
0 1 1 7
1 1 1 8
X
Y1 Y2 Y3
Z1 Z2
X
Y
Z1 Z2
5
3.6 Likelihood weighting
Consider the belief network shown below, with n binary random variables Bi∈ {0, 1} and an integer random
variable Z. Let f(B) = Pn
i=1 2
i−1Bi denote the nonnegative integer whose binary representation is given
by BnBn−1 . . . B2B1. Suppose that each bit has prior probability P(Bi = 1) = 1
2
, and that
P(Z|B1, B2, . . . , Bn) = ?
1 − α
1 + α
?
α
|Z−f(B)|
where 0 < α < 1 is a parameter measuring the amount of noise in the conversion from binary to decimal.
(Larger values of α indicate greater levels of noise.)
bits Bi
integer Z
...
(a) Show that the conditional distribution for binary to decimal conversion is normalized; namely, that
P
z P(Z =z|B1, B2, . . . , Bn) = 1, where the sum is over all integers z ∈ [−∞, +∞].
(b) Consider a network with n= 10 bits and noise level α= 0.1. Use the method of likelihood weighting
to estimate the probability P(Bi = 1|Z = 128) for i ∈ {2, 5, 8, 10}.
(c) Plot your estimates in part (b) as a function of the number of samples. You should be confident from
the plots that your estimates have converged to a good degree of precision (say, at least two significant
digits).
(d) Submit a hard-copy printout of your source code. You may program in the language of your choice,
and you may use any program at your disposal to plot the results.
6
3.7 Even more inference
Show how to perform the desired inference in each of the belief networks shown below. Justify briefly each
step in your calculations.
(a) Markov blanket
Show how to compute the posterior probability P(B|A, C, D)
in terms of the CPTs of this belief network—namely, P(A),
P(B|A), P(C), and P(D|B, C).
A B D
C
(b) Conditional independence
This belief network has conditional probability tables
for P(F|A) and P(E|C) in addition to those of the
previous problem. Assuming that all these tables are
given, show how to compute the posterior probability
P(B|A, C, D, E, F) in terms of these additional CPTs and
your answer to part (a).
A B D
C E
F
(c) More conditional independence
Assuming that all the conditional probability tables in
this belief network are given, show how to compute
the posterior probability P(B, E, F|A, C, D). Express
your answer in terms of the CPTs of the network, as
well as your earlier answers for parts (a) and (b).
A B D
C E
F
7
3.8 More likelihood weighting
(a) Single node of evidence
X Q
Y Z
E
Suppose that T samples {qt
, xt
, yt
, zt}
T
t=1 are drawn from the CPTs of the belief network shown
above (with fixed evidence E = e). Show how to estimate P(Q=q|E =e) from these samples using
the method of likelihood weighting. Express your answer in terms of sums over indicator functions,
such as:
I(q, q0
) = (
1 if q = q
0
0 otherwise
In addition, all probabilities in your answer should be expressed in terms of CPTs of the belief network (i.e., probabilities that do not require any additional computation).
(b) Multiple nodes of evidence
Suppose that T samples {q1t
, q2t
, xt
, yt
, zt}
T
t=1 are drawn from the CPTs of the network shown below
(with fixed evidence E1 =e1 and E2 =e2). Show how to estimate P(Q1 =q1, Q2 =q2|E1 =e1, E2 =
e2) from these samples using the method of likelihood weighting. Express your answer in terms of
indicator functions and CPTs of the belief network.
Q1 X
E1 Z
E2
Y
Q2
8