Homework 5: Graphical Models, MDPs
Introduction
There is a mathematical component and a programming component to this homework. Please submit your tex, PDF, and Python files to Canvas, and push all of your work to your GitHub repository. If a question requires you to make any plots, please include those in the writeup.
Bayesian Networks [7 pts]
Problem 1
In this problem we explore the conditional independence properties of a Bayesian Network. Consider the following Bayesian network representing a person’s skin condition. Each random variable is binary (true/false).
The random variables are: • Was Outside: Was the person outside? • Get Sunburn: Did the person get sunburn? • Get Insect Bite: Did the person get an insect bite? • Have Acne: Does the person have acne? • Skin Red: Is the skin red? • Bump on Skin: Is there a bump? For the following questions, I(A,B) means that events A and B are independent and I(A,B|C) means that events A and B are independent conditioned on C. Use the concept of d-separation to answer the questions and show your work. 1. Is I(Have Acne,Was Outside)? If NO, give intuition for why. 2. Is I(Have Acne,Was Outside|Skin Red)? If NO, give intuition for why. 3. Is I(Get Sunburn,Bump on Skin)? If NO, give intuition for why. 4. Is I(Get Sunburn,Bump on Skin|Get Insect Bite)? If NO, give intuition for why. 5. Suppose the person has taken a medicine to suppress their response to insect bites: they get red skin, but no bumps. Draw the modified network. 6. For this modified network, is I(Get Sunburn,Bump on Skin)? If NO, give an intuition why. If YES, describe what observations (if any) would cause them to no longer be independent.
Kalman Filters [7 pts]
Problem 2 In this problem, you will implement a one-dimensional Kalman filter. Assume the following dynamical system model:
zt+1 = zt + ✏t xt = zt + t
where z are the hidden variables and x are the observed measurements. The random variables ✏ and are drawn from the following Normal distributions: ✏t ⇠ N(µ✏, ✏) t ⇠ N(µ,) where µ✏ = 0, ✏ =0 .05, µ = 0 and =1 .0 You are provided with the observed data x and the hidden data z in kf-data.csv, and the prior on the first hidden state is p(z0) = N(µp,p) whereµp = 5 and p =1 (a) The distribution p(zt|x0...xt) will be Gaussian N(µt,2 t ). Derive an iterative update for the mean µt and variance 2 t given the mean and variance at the previous time step (µt1 and 2 t1). (b) Implement this update and apply it to the observed data above (do not use the hidden data to find these updates). Provide a plot of µt over time as well as a 2t interval around µt (that is µt ±2t). Does the Kalman filter “catch up” to the true hidden object? (c) Repeat the same process but change the observation at time t = 10 to xt=10 = 10.2 (an extreme outlier measurement). How does the Kalman filter respond to this outlier? (d) Comment on the misspecification of dynamical system model for these data. Based on the previous two parts, how does this misspecification a↵ect the predictions?
Markov Decision Processes [7 pts]
Problem 3
In this problem we will explore the calculation of the MDP value function V in a 2D exploration setting, without time discounting and for a finite time horizon. Consider a robot navigating the following grid:
B
A
The robot moves in exactly one direction each turn (and must move). The robot’s goal is to maximize its score in the game after T steps. The score is defined in the following way: • If the robot’s movement attempts to move it o↵ the grid, then the robot loses a point (-1) and stays where it is. • If the robot ends its motion on node A, it receives 10 points, and if it moves onto node B, it receives -100 points. Otherwise, it receives 0 points. 1. Model this as a Markov decision process: define the states S, actions A, reward function r : S ⇥A 7! R, and transition model p(s0|s,a) for s0,s2 S and a 2 A. For now, assume that the robot’s actions execute perfectly: if the robot tries to move in a particular direction, it always succeeds in doing so. 2. Consider a random policy ⇡, where in each state the robot moves uniformly at randomly in any of its available directions (including o↵ the board). For every position on the grid calculate the value function, V ⇡ t : S 7! R, under this policy, for t =2 ,1 steps left to go. You can find LaTeX code for the tables in the solution template. Note that you should have 2 tables, one for each time horizon. 3. Now assume that the robot plays an optimal policy ⇡⇤ t (for t time steps to go). Find the optimal policy in the case of a finite time horizon of t =1 ,2 and give the corresponding MDP value functions V⇤ t : S 7! R, under this optimal policy. You can indicate the optimal policy for each time horizon on the corresponding V⇤ t table via arrows or words in the direction that the robot should move from that state. 4. Now consider the situation where the robot does not have complete control over its movement. In particular, when it chooses a direction, there is a 80% chance that it will go in that direction, and a 10% chance it will go in the two adjacent (90 left or 90 right) directions. Explain how this changes the elements S, A, r, and p(s0|s,a) of the MDP model. Assume the robot uses the same policy ⇡⇤ t from the previous question (now possibly non-optimal), and write this as ⇡t, and tie-break in favor of N, then E, then S then W. Give the corresponding MDP value functions V ⇡ t : S 7!R, for this policy in this partial control world, for t =2 ,1 steps left to go. Is the policy still optimal?