$30
COS 529 Assignment 1: Probabilistic Graphical
Model Inference
Collaboration policy This assignment must be completed individually. No
collaboration is allowed.
1 Task
In this assignment, you will be implementing the algorithms for inference and
learning in undirected graphical models (a.k.a. Markov random fields). G =
(V, E) is an undirected graph without self-loop, where V is the set of vertices
and E ⊆ V × V is the set of edges. We associate each vertex v ∈ V with a
random variable (also denoted by v) taking values from D = {0, 1, . . . , K − 1},
where K 1 is a fixed integer.
The probabilistic distribution of these variables is defined by potential functions. Each vertex v has a unary potential ψv : D → R
+, which maps any
assignment of v to a positive real number. Likewise, each edge (u, v) is associated with a binary potential ψu,v : D ×D → R
+. Assuming there are n vertices,
their joint probabilistic distribution is:
p(v) = 1
Z
Y
v∈V
ψv(v)
Y
(u,v)∈E
ψu,v(v), (1)
where v = (v1, v2, . . . , vn) ∈ Dn is a particular assignment of all random variables. ψv(v) and ψu,v(v) are the values of the potentials under v. Z is a
normalizer to ensure the probabilities sum up to 1; we thus have:
Z =
X
v∈Dn
Y
v∈V
ψv(v)
Y
(u,v)∈E
ψu,v(v) (2)
Eqn. 1 represents a family of distributions parameterized by the values
of the potential functions. Each ψv contributes to K parameters and each
ψu,v contributes to K × K parameters. Given a dataset of i.i.d. samples
S = {v
(1)
, v
(2)
, . . . , v
(m)}, we would like to fit the family of distributions to
the data by maximum likelihood estimation (MLE), in which we search for the
parameters that maximizes the likelihood of the data:
1
p(S | ψv, ψu,v) = Ym
i=1
p(v
(i)
) (3)
For numerical stability, we maximize the log-likelihood instead:
L = log p(S | ψv, ψu,v) = Xm
i=1
log p(v
(i)
) (4)
Since the gradients ∂L
∂(ψv,ψu,v)
can be computed efficiently, we usually maximize
Eqn. 4 via first-order optimization methods such as gradient descent.
In this assignment, you will be working with trees (connected and without
loops). Your task is to implement the probabilistic inference over the joint
distribution specified by Eqn. 1, and to compute the gradients of the data loglikelihood with respect to the potentials.
1. Marginal distributions: For each variable vi
, compute its marginal distribution.
pi(vi) = X
v1∈D
· · · X
vi−1∈D
X
vi+1∈D
· · · X
vn∈D
p(v) (5)
2. Maximum a posteriori (MAP) inference: Compute an assignment of all
variables that maximizes the joint probability.
vMAP = argmax
v∈Dn
p(v) (6)
3. Gradients of the log-likelihood. We only consider the simplified case when
there is only one sample v
(0) in the dataset.
∂L
∂(ψv, ψu,v)
=
∂ log p(v
(0))
∂(ψv, ψu,v)
(7)
2 Example
Take G to be the graph below, with vertices taking binary values (K = 2).
1 2
The potential functions are:
2
ψ1(u) = (
2.0 if u = 0
2.0 if u = 1
(8)
ψ2(v) = (
1.0 if v = 0
5.0 if v = 1
(9)
ψ1,2(u, v) =
4.0 if u = 0 and v = 0
1.0 if u = 0 and v = 1
1.0 if u = 1 and v = 0
3.0 if u = 1 and v = 1
(10)
(11)
The joint probability of the two variables can be computed using Eqn. 1 and
Eqn. 2.
Z =
X
1
u=0
X
1
v=0
ψ1(u)ψ2(v)ψ1,2(u, v) = 8 + 10 + 2 + 30 = 50 (12)
p(0, 0) = 1
Z
ψ1(0)ψ2(0)ψ1,2(0, 0) = 8
50
= 0.16 (13)
p(0, 1) = 1
Z
ψ1(0)ψ2(1)ψ1,2(0, 1) = 10
50
= 0.20 (14)
p(1, 0) = 1
Z
ψ1(1)ψ2(0)ψ1,2(1, 0) = 2
50
= 0.04 (15)
p(1, 1) = 1
Z
ψ1(1)ψ2(1)ψ1,2(1, 1) = 30
50
= 0.60 (16)
The marginal distributions:
p0(u) = (
0.36 if u = 0
0.64 if u = 1
(17)
p1(v) = (
0.20 if v = 0
0.80 if v = 1
(18)
The MAP inference:
vMAP = (1, 1) (19)
Given a dataset {(0, 1)} consisting of one sample, we compute the gradients in
Eqn. 7.
3
∂L
∂ψ1
=
∂ log p(0, 1)
∂ψ1
= [0.32, −0.32]T
(20)
∂L
∂ψ2
=
∂ log p(0, 1)
∂ψ2
= [−0.20, 0.04]T
(21)
∂L
∂ψ1,2
=
∂ log p(0, 1)
∂ψ1,2
=
?
−0.04 0.80
−0.04 −0.20?
(22)
3 I/O Format
You will be working with trees in this assignment. You will implement in Python
the functions for MAP inference, marginal distributions, and gradients of the
log-likelihood. You will complete the inference and inference brute force
functions in pgm.py. They perform the same function, but inference should
be able to handle large input graphs.
Input The input is a graph G, the potential functions, an integer K that determines the domain of the random variables, and an assignment of all variables
for computing the gradients w.r.t. potentials.
• G is a Graph object in the NetworkX library. It supports a variety of
operations, including accessing nodes, edges, etc. For details, you may
consult the official documentation or the starter code for this assignment.
• G.nodes[v][‘unary potential’] is a 1-d numpy array of length K denoting the unary potential function for node v.
• G.edges[u, v][‘binary potential’] is a 2-d numpy array of size K ×K
denoting the binary potential function for edge (u, v).
• K is given as an attribute of the graph, which can be accessed via G.graph[‘K’].
• The given assignment for node v is G.nodes[v][‘assignment’].
For this assignment, G is always a tree (connected and without loops). K is
greater than 1, but it not necessarily equals 2.
Output In the starter code, we have initialized buffers for the output. You
only have to fill in values in them.
• G.nodes[v][‘marginal prob’] stores the marginal distribution for node
v as a 1-d numpy array of size K.
• G.graph[‘v map’] stores the MAP assignment as a 1-d numpy array of
size n, where n is the number of vertices in G.
4
• G.nodes[v][‘gradient unary potential’] stores the gradient ∂L
∂ψu
as a
1-d numpy array of size K.
• G.edges[u, v][‘gradient binary potential’] stores the gradient ∂L
∂ψ(u,v)
as a 2-d numpy array of size K × K.
For the previous example, the correct output would be:
G.nodes[0][‘marginal prob’] = np.array([0.36, 0.64])
G.nodes[1][‘marginal prob’] = np.array([0.20, 0.80])
G.graph[‘v map’] = np.array([1, 1])
G.nodes[0][‘gradient unary potential’] = np.array([0.32, -0.32])
G.nodes[1][‘gradient unary potential’] = np.array([-0.20, 0.04])
G.edges[0, 1][‘gradient binary potential’] = np.array([[-0.04, 0.80], [-0.04, -0.20]])
You can run the code against this test case by “python3 pgm.py test graph 00.pickle”.
4 Library restrictions
In your implementation, you may not import any additional library that is not
in the Python Standard Library: https://docs.python.org/3/library/.
5 What to submit
Please submit to Blackboard the pgm.py file with inference and inference brute force
completed.
6 Grading
• inference brute force (20 points): A brute-force implementation. It
will be tested only on small graphs (up to 8 nodes, K <= 5).
• inference (80 points): An efficient implementation (e.g. using belief
propagation). It will be tested on large graphs (up to 200 nodes, with
K <= 5).
For a single test graph, you will receive credit according to the following
criteria. A solution will be considered correct if every output value is within
0.001 of the correct value. Your final credit will be the average over 10 test
graphs.
+ 40% for correct marginal prob
+ 40% for for correct MAP
5
+ 10% for correct unary potential gradients
+ 10% for correct binary potential gradients
For a single input graph, both your implementations (brute force and efficient) have a time limit of 20 min on a single CPU core and a memory limit of
2GB (time/memory determined by the AWS EC2 A1 machine as reference). Implementations which exceed these limits on this machine will not receive points.
Your brute force implementation will be tested on graphs of up to 8 nodes with
K <= 5. Your efficient implementation will be tested on large graphs up to 200
nodes with K <= 5. Your solutions will be evaluated on graphs with unary and
binary potentials in the range [1,5].
Your brute-force implementation does not have to be actually brute-force
and can be the same as your efficient implementation. But we recommend that
you implement an actual brute-force version to help debug your efficient one.
We have provided 5 small testing cases with groundtruth marginal probabilities and MAP estimation to help you debug. note that passing the provided
test cases does not in any way guarantee the correctness of your implementation, and it’s your responsibility to devise your own tests
and debug your code.
6