$30
HOMEWORK 5
Instructions:
• Please submit your answers in a single pdf file and your code in a zip file. pdf preferably made using
latex. No need to submit latex code. See piazza post @114 for some recommendations on how to write the
answers.
• Submit code for programming exercises. Though we provide a base code with python (jupyter notebook),
you can use any programming language you like as long as you use the same model and dataset.
• Submit all the material on time.
1 Implementation: GAN (40 pts)
In this part, you are expected to implement GAN with MNIST dataset. We have provided a base jupyter notebook
(gan-base.ipynb) for you to start with, which provides a model setup and training configurations to train GAN
with MNIST dataset.
(a) Implement training loop and report learning curves and generated images in epoch 1, 50, 100. Note that
drawing learning curves and visualization of images are already implemented in provided jupyter notebook.
(20 pts)
Procedure 1 Training GAN, modified from Goodfellow et al. (2014)
Input: m: real data batch size, nz: fake data batch size
Output: Discriminator D, Generator G
for number of training iterations do
# Training discriminator
Sample minibatch of nz noise samples {z
(1), z(2)
, · · · , z(nz)} from noise prior pg(z)
Sample minibatch of {x
(1), x(2)
, · · · , x(m)}
Update the discriminator by ascending its stochastic gradient:
∇θd
1
m
Xm
i=1
log D(x
(i)
) +
1
nz
logXnz
i=1
(1 − D(G(z
(i)
)))
# Training generator
Sample minibatch of nz noise samples {z
(1), z(2)
, · · · , z(nz)} from noise prior pg(z)
Update the generator by ascending its stochastic gradient:
∇θg
1
nz
Xnz
i=1
log D(G(z
(i)
))
end for
# The gradient-based updates can use any standard gradient-based learning rule. In the base code, we are using
Adam optimizer (Kingma and Ba, 2014)
Expected results are as follows.
Homework 5 CS 760 Machine Learning
Figure 1: Learning curve
(a) epoch 1 (b) epoch 50 (c) epoch 100
Figure 2: Generated images by G
Solution goes here.
(b) Replace the generator update rule as the original one in the slide,
“Update the generator by descending its stochastic gradient:
∇θg
1
nz
Xnz
i=1
log(1 − D(G(z
(i)
)))
” , and report learning curves and generated images in epoch 1, 50, 100. Compare the result with (a). Note
that it may not work. If training does not work, explain why it doesn’t work. (10 pts)
Solution goes here.
(c) Except the method that we used in (a), how can we improve training for GAN? Implement that and report
learning curves and generated images in epoch 1, 50, 100. (10 pts)
Solution goes here.
2
Homework 5 CS 760 Machine Learning
2 Ridge regression [15 pts]
Derive the closed-form solution in matrix form for the ridge regression problem:
min
β
1
n
Xn
i=1
(z
>
i β − yi)
2
!
+ λkβk
2
A
where
kβk
2
A := β
>Aβ
and
A =
0 0 0
0 1 0
0 0 1
.
This A matrix has the effect of NOT regularizing the bias β0, which is standard practice in ridge regression. Note:
Derive the closed-form solution, do not blindly copy lecture notes.
Solution goes here.
3 Review the change of variable in probability density function [25 pts]
In Flow based generative model, we have seen pθ(x) = p(fθ(x))|
∂fθ(x)
∂x |. As a hands-on (fixed parameter) example, consider the following setting.
Let X and Y be independent, standard normal random variables. Consider the transformation U = X + Y and
V = X − Y . In the notation used above, U = g1(X, Y ) where g1(X, Y ) where g1(x, y) = x + y and V =
g2(X, Y ) where g2(x, y) = x − y. The joint pdf of X and Y is fX,Y = (2π)
−1
exp(−x
2/2)exp(−y
2/2), −∞ <
x < ∞, −∞ < y < ∞. Then, we can determine u, v values by x, y, i.e.
u
v
=
1 1
1 −1
x
y
.
(a) (5 pts) Compute Jacobian matrix
J =
∂x
∂u
∂x
∂v
∂y
∂u
∂y
∂v
(5 pts)
Solution goes here.
(b) (Forward) Show that the joint pdf of U, V is
fU,V (u, v) = 1
√
2π
√
2
exp(−u
2
/4) 1
√
2π
√
2
exp(−v
2
/4)
(10 pts)
(Hint: fU,V (u, v) = fX,Y (?, ?)|J|)
Solution goes here.
(c) (Inverse) Check whether the following equation holds or not.
fX,Y (x, y) = fU,V (x + y, x − y)|J|
−1
(10 pts)
Solution goes here.
Homework 5 CS 760 Machine Learning
4 Directed Graphical Model [20 points]
Consider the directed graphical model (aka Bayesian network) in Figure 3.
Figure 3: A Bayesian Network example.
Compute P(B = t | E = f, J = t, M = t) and P(B = t | E = t, J = t, M = t). These are the conditional
probabilities of a burglar in your house (yikes!) when both of your neighbors John and Mary call you and say
they hear an alarm in your house, but without or with an earthquake also going on in that area (what a busy day),
respectively.
Solution goes here.
References
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y.
(2014). Generative adversarial nets. Advances in neural information processing systems, 27.
Kingma, D. P. and Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
4