Starting from:

$29.99

Probability Grids_Homework 4

Probability Grids
Problem 1.Hidden Markov Model

(Adapted from Russell-NorvigProblem 15.9)                                                                                                                       (30 points)

 

Consider a vacuum robot in an empty room (no obstacles) represented by a rectangular n x m grid.

 

Sensor Model.The robot does not have an accurate localization device on it, so it does not know accurately which cell it is in. However, it does have a localization sensor that gives the robot’s location (cell), albeit with some error. Following is the noisy localization information the robot gets - if the robot is at location (x, y) then,

·         with probability 0.1, the sensor reports the correct location

·         with probability 0.05 each, the sensor reports the location of one of the 8-neighbors of (x, y)

·         with probability 0.025 each, the sensor reports the location of one of the 16-neighbors of neighbors of (x, y)

·         with probability 0.1, the sensor reports “No Reading”

 

Transition Model.The robot’s policy is to pick a direction and follow it with probability 0.8 at each step; with probability 0.2 it switches to a randomly selected new heading (new means different from the robot’s heading at the last step). If it encounters a wall, it selects a new random heading with probability 1.

 

(a)    Formulate this problem as atemporal Bayesian network. You need to correctly identify the state and evidence variables, the causal dependencies between them, the CPT values of the transition and sensor models and the transition and sensor matrices. Submit your answer as a text file name prob1a.txt                            (5 points)

 

(b)   Use an HMM to calculate the state (location) estimation using the filtered value of the current state. You can use any HMM implementation that is available online, as long as it is in Java, C++ or C.                            (25points)

 

To test yourHMM algorithm, your program should input the values of N (rows) and M (columns) of the environment. The robot’s initial location coordinates as x (start column), and y (start row) are also given as input. Coordinates (0, 0) correspond to bottom left corner. Finally, you need to input the lag variable ‘d’ for fixed lag smoothing. Use the input format given below:

 

 

Enter N: 10

Enter M: 20

Enter start x: 5

Enter start y: 4

Enter lag (d): 5

 

Your program should then run. At each step it should do the following:

·         Move the robot to a new location according to its motion policy given in the problem. This is the new state variable, but it is unobserved. It will be used in the next step to generate the evidence variable.

·         Generate the robot’s new location (noisy) based on its actual location in the above step and the error model given in the problem. This is the evidence variable.

·         Run the HMM based on the evidence variables to calculate the new state.

·         Output the evidence state (location) and filtered state (location) in the format below, where (xe1, ye1), (xf1, yf1), etc. are coordinates calculated by your program

 

At step 1: Ev: (xe1, ye1) Fil: (xf1, yf1)

At step 2: Ev: (xe2, ye2) Fil:  (xf1, yf2)

At step 3: Ev: (xe3, ye3) Fil:  (xf3, yf3)

… 

                                                                                                                                                       

 

Problem 2.Markov Decision Process (MDP)

(Adapted from Russell-NorvigProblem 17.8)                                                                       (30 points = 15 points each part)

 

In class, we studied that one way to solve the Bellman update equation in MDPs is using the Value iteration algorithm. (Figure 17.4 of textbook).

 

(a)    Implement the value iteration algorithm to calculate the policy for navigating a robot (agent) with uncertain motion in a rectangular grid, similar to the situation discussed in class, from Section 17.1 of the textbook.

(b)   Calculate the same robot’s policy in the same environment, this time using the policy iteration algorithm.

You can combine these two parts into the same class or program and have the user input select the appropriate algorithm.

 

Your program should create the 3 x 3 grid world given in Figure 17.14 (a) of the textbook along with the corresponding rewards at each state (cell). The transition model for your agent is the same as that given in Section 17.1 (discussed in class) – 80% of the time it goes in the intended direction, 20% of the time it goes at right angles to its intended direction. You should accept the following values of r as input: 100, -3, 0 and +3. The input format is below:

 

Enter r:<value of r

Enter 1 for Value Iteration, 2 for Policy Iteration, 3 to Exit: <1 or 2 or 3

 

The output of your program should givethe policy for each cell in the grid world calculated by your program(s). For value iteration, the policy at each state (cell) is calculated using the policy equation (Equation 17.4 of textbook). For policy iteration, the algorithm’s output is the policy for each state.

 

Output format:

Policy table calculated:

(0, 0):<calculated policy

(0, 1):<calculated policy

More products