Assignment # 11 MDP

Starting from:

~~$30~~

$27

Assignment # 11 MDP

CS 540: Introduction to Artificial Intelligence
Homework Assignment # 11

Hand in your homework:
If a homework has programming questions, please hand in the Java program. If a homework has written
questions, please hand in a PDF file. Regardless, please zip all your files into hwX.zip where X is the
homework number. Go to UW Canvas, choose your CS540 course, choose Assignment, click on Homework
X: this is where you submit your zip file.

Question 1: MDP
Consider state space S = {s1, s2} and action space A = {lef t, right}.
In s1 the action “right” sends the agent to s2 and collects reward r = 1. In s2 the action “left” sends
the agent to s1 but with zero reward. All other state-action pairs stay in that state with zero reward. With
discounting factor γ, what is the value v(s2) under the optimal policy?
Question 2: Value function
Suppose a policy π is shown by red arrows, the discount factor γ = 0.9. Compute the value function V
π
(s)
for all states s.
Question 3: Q-learning
A robot initializes Q-learning by setting q(s, a) = 0 for all state s and action a. It has a learning rate α, and
discounting factor γ. The robot senses that it is in state s105 and decides to performs action a540. For this
action, the robot receives reward 100 and arrives at state s7331. What value is q(s105, a540) after this one
step of Q-learning?