Starting from:

$29

Assignment 2 Probabilistic inference

CSE 250A. Assignment 2

Reading: Russell & Norvig, Chapters 13-14.
2.1 Probabilistic inference
Recall the alarm belief network described in class. The directed acyclic graph (DAG) and conditional
probability tables (CPTs) are shown below:
Alarm
Earthquake Burglar
John Calls Mary Calls
P(E=1) = 0.002 P(B=1) = 0.001
P(A=1|E=0,B=0) = 0.001
P(A=1|E=0,B=1) = 0.94
P(A=1|E=1,B=0) = 0.29
P(A=1|E=1,B=1) = 0.95
P(J=1|A=0) = 0.05
P(J=1|A=1) = 0.90
P(M=1|A=0) = 0.01
P(M=1|A=1) = 0.70
Compute numeric values for the following probabilities, exploiting relations of marginal and conditional
independence as much as possible to simplify your calculations. You may re-use numerical results from
lecture, but otherwise show your work. Be careful not to drop significant digits in your answer.
(a) P(E = 1|A= 1) (c) P(A= 1|M = 1) (e) P(A= 1|M = 0)
(b) P(E = 1|A= 1, B = 0) (d) P(A= 1|M = 1, J = 0) (f) P(A= 1|M = 0, B = 1)
Consider your results in (b) versus (a), (d) versus (c), and (f) versus (e). Do they seem consistent with
commonsense patterns of reasoning?
1
2.2 Probabilistic reasoning
A patient is known to have contracted a rare disease which comes in two forms, represented by the values
of a binary random variable D ∈ {0, 1}. Symptoms of the disease are represented by the binary random
variables Sk ∈ {0, 1}, and knowledge of the disease is summarized by the belief network:
D
S1 S2 S3 S
... n
The conditional probability tables (CPTs) for this belief network are as follows. In the absence of evidence,
both forms of the disease are equally likely, with prior probabilities: P(D = 0) = P(D = 1) = 1
2
. In the first
form of the disease (D = 0), the first symptom occurs with probability one,
P(S1 = 1|D = 0) = 1,
while the k
th symptom (with k≥2) occurs with probability
P(Sk = 1|D = 0) = f(k − 1)
f(k)
,
where the function f(k) is defined by
f(k) = 2k + (−1)k
.
By contrast, in the second form of the disease (D = 1), all the symptoms are uniformly likely to be observed,
with P(Sk = 1|D = 1) = 1
2
for all k.
Suppose that on the k
th day of the month, a test is done to determine whether the patient is exhibiting the
k
th symptom, and that each such test returns a positive result. Thus, on the k
th day, the doctor observes the
patient with symptoms {S1 = 1, S2 = 1, . . . , Sk = 1}. Based on the cumulative evidence, the doctor makes a
new diagnosis each day by computing the ratio:
rk =
P(D = 0|S1 = 1, S2 = 1, . . . , Sk = 1)
P(D = 1|S1 = 1, S2 = 1, . . . , Sk = 1).
If this ratio is greater than 1, the doctor diagnoses the patient with the D = 0 form of the disease; otherwise,
with the D = 1 form.
(a) Compute the ratio rk as a function of k. How does the doctor’s diagnosis depend on the day of the
month? Show your work.
(b) Does the diagnosis become more or less certain as more symptoms are observed? Explain.
2
2.3 Sigmoid function
Let Y ∈ {0, 1} denote a binary random variable that depends on k other random variables Xi as:
P(Y = 1|X1 =x1, X2 =x2, . . . , Xk =xk) = σ
X
k
i=1
wixi
!
where σ(z) = 1
1 + e−z
.
The real-valued parameters wi
in this CPT are known as weights. The so-called sigmoid function σ(z) arises
in many contexts. In neural networks, it models the probability that a neuron Y fires given its input from
other neurons Xi
; the weights wi describe the connections between neurons. In statistics, the sigmoid function appears in models of logistic regression. Sketch the function σ(z), and verify the following properties:
(a) σ
0
(z) = σ(z)σ(−z).
(b) σ(−z) + σ(z) = 1.
(c) L(σ(z)) = z, where L(p) = log ?
p
1−p
?
is the log-odds function.
(d) wi = L(pi), where pi = P(Y = 1|Xi = 1, Xj = 0 for all j 6=i).
2.4 Conditional independence
Consider the DAG shown below, describing the following domain. Given the month of the year, there is
some probability of rain, and also some probability that the sprinkler is turned on. Either of these
events leads to some probability that a puddle forms on the sidewalk, which in turn leads to some probability that someone has a fall.
month
sprinkler
rain
puddle fall
List all the conditional independence relations that must hold in any probability distribution represented by
this DAG. More specifically, list all tuples {X, Y, E} such that P(X, Y |E) = P(X|E)P(Y |E), where
X, Y ∈ {month, rain, sprinkler, puddle, fall},
E ⊆ {month, rain, sprinkler, puddle, fall},
X 6= Y,
X, Y 6∈ E.
Hint: There are sixteen such tuples, not counting those that are equivalent up to exchange of X and Y . Do
any of the tuples contain the case E =∅?
3
2.5 Markov blanket
The Markov blanket BX of a node X in a belief network includes its parents, its children, and the parents
of its children (excluding the node X itself). Draw a picture of a Markov blanket; then, by appealing to
the three conditions for d-separation, prove that for any node Y outside the Markov blanket of X (that is,
satisfying Y 6∈ BX and Y 6= X), we have:
P(X, Y |BX) = P(X|BX)P(Y |BX).
Hint: A complete solution will consider the five different types of paths from nodes outside the Markov
blanket to the node X; the picture below illustrates these five cases (e.g., via a child of a spouse, via a parent
of a spouse, etc).
X
3
4 2
1 5
4
2.6 True or false
For the belief network shown below, indicate whether the following statements of marginal or conditional
independence are true (T) or false (F).
G H
A
D E
C
F
B
P(B|G, C) = P(B|G)
P(F, G|D) = P(F|D) P(G|D)
P(A, C) = P(A) P(C)
P(D|B, F, G) = P(D|B, F, G, A)
P(F, H) = P(F) P(H)
P(D, E|F, H) = P(D|F) P(E|H)
P(F, C|G) = P(F|G) P(C|G)
P(D, E, G) = P(D) P(E) P(G|D, E)
P(H|C) = P(H|A, B, C, D, F)
P(H|A, C) = P(H|A, C, G)
5
2.7 Subsets
A
D E
C
F
B
For the belief network shown above, consider the following statements of conditional independence. Indicate
the largest subset of nodes S ⊂ {A, B, C, D, E, F} for which each statement is true. Note that one possible
answer is the empty set S =∅ or S ={} (whichever notation you prefer). The first one is done as an example.
P(A) = P(A|S) S = {B, C, E}
P(A|D) = P(A|S)
P(A|B, D) = P(A|S)
P(B|D, E) = P(B|S)
P(E) = P(E|S)
P(E|F) = P(E|S)
P(E|D, F) = P(E|S)
P(E|B, C) = P(E|S)
P(F) = P(F|S)
P(F|D) = P(F|S)
P(F|D, E) = P(F|S)
6
2.8 Noisy-OR
X Y
Z
Nodes: X ∈ {0, 1}, Y ∈ {0, 1}, Z ∈ {0, 1}
Noisy-OR CPT: P(Z = 1|X, Y ) = 1 − (1 − px)
X
(1 − py)
Y
Parameters: px ∈ [0, 1], py ∈ [0, 1], px < py
Suppose that the nodes in this network represent binary random variables and that the CPT for P(Z|X, Y )
is parameterized by a noisy-OR model, as shown above. Suppose also that
0 < P(X = 1) < 1,
0 < P(Y = 1) < 1,
while the parameters of the noisy-OR model satisfy:
0 < px < py < 1.
Consider the following pairs of probabilities. In each case, indicate whether the probability on the left is
equal (=), greater than (), or less than (<) the probability on the right. The first one has been filled in for
you as an example. (You should use your intuition for these problems; you are not required to show work.)
P(X = 1) = P(X = 1)
(a) P(Z = 1|X = 0, Y = 0) P(Z = 1|X = 0, Y = 1)
(b) P(Z = 1|X = 1, Y = 0) P(Z = 1|X = 0, Y = 1)
(c) P(Z = 1|X = 1, Y = 0) P(Z = 1|X = 1, Y = 1)
(d) P(X = 1) P(X = 1|Z = 1)
(e) P(X = 1) P(X = 1|Y = 1)
(f) P(X = 1|Z = 1) P(X = 1|Y = 1, Z = 1)
(g) P(X = 1) P(Y = 1) P(Z = 1) P(X = 1, Y = 1, Z = 1)
Challenge (optional): for each case, prove rigorously the correctness of your answer.
7
2.9 Polytree inference
C E F
D G
A
B
Consider the belief network shown above. In this problem you will be guided through an efficient computation of the posterior probability P(F|A, B, D, G). You should perform these computations efficiently—that
is, by exploiting the structure of the DAG and not marginalizing over more variables than necessary.
(a) Bayes rule
Show how to compute the posterior probability P(C|A, B, D) in terms of the conditional probability
tables (CPTs) for these nodes—i.e., in terms of P(A), P(B), P(C|A), and P(D|B, C). For this
problem, it may help to consider just the part of the network shown below.
C
D
A
B
(b) Marginalization
Show how to compute the posterior probability P(E|A, B, D) in terms of your answer from part (a)
and the CPTs of the belief network. For this problem, it may help to consider just the part of the
network shown below.
C E
D
A
B
(c) Marginalization
Show how to compute the posterior probability P(G|A, B, D) in terms of your answer from part (b)
and the CPTs of the belief network.
(d) Explaining away
Show how to compute the posterior probability P(F|A, B, D, G) in terms of your answers from
parts (b,c) and the CPTs of the belief network.
8

More products