$30
Programming
Assignment 4
This assignment is meant to give you the
experience of developing both agent and
environment. Consequently it is more open-ended
than your previous assignments. As a part of this
assignment, you will implement the Windy
Gridworld task given as Example 6.5 by Sutton and
Barto (2018). You will program some agentenvironment interactions, record your results, and
present you interpretations. You can use any
programming language of your choice for this
assignment.
Tasks
Implement Windy Gridworld as an episodic
MDP. The core of your code will have to be a
function (or functions) to obtain next state
and reward for a given state and action. You
can use your own function names and
conventions.
1.
Implement a Sarsa(0) agent as described in
the example, and obtain a baseline plot
similar to the one accompanying the example
(episodes against time steps). You can set
learning and exploration rates as you see fit
(just be sure to describe them in your report).
2.
Get another plot when King's moves are
permitted (that is, 8 actions in total), as
described in Exercise 6.9.
3.
Add stochasticity to the task as described in
Exercise 6.10, and again plot the resulting
performance of the Sarsa agent. Make sure
you note down your convention for modeling
corner cases.
4.
In all your experiments, generate at least ten
independent runs by varying the random seed. Plot
CS 747: Programming Assignment 4 https://www.cse.iitb.ac.in/~shivaram/teaching/c...
1 of 3 04/11/18, 12:29
the average statistic in the graphs.
Submission
You must submit:
Your code for implementing the task and its
variants;
1.
2. Code for your Sarsa agent;
A script to run your simulations and gather
data;
3.
4. Plots of your agent's performance;
A README file describing how to run your
code and obtain the plots; and
5.
A report presenting your observations from
these experiments (as a pdf file).
6.
Place all these items in a directory titled [rollno]
(such as 1234567). You must then submit your
[rollno] directory, compressed as [rollno].tar.gz
(say 1234567.tar.gz). Before you upload the
submission to Moodle, make sure you can
successfully run your code on the departmental
(sl2) machines.
Convince yourself that the results obtained match
your expectations. Feel free to be creative and use
the simulation environment to test related
hypotheses you might find interesting. Your
observations (under 6) must explain the variations
observed across the three task settings, and report
any particular issues you encountered while
experimenting with this task. Don't hesitate to
include additional numbers or graphs.
Evaluation
Your marks will be divided roughly equally among
the three tasks you have to implement, in each case
determined by the plot and the accompanying
observations.
The TAs and instructor may look at your source
CS 747: Programming Assignment 4 https://www.cse.iitb.ac.in/~shivaram/teaching/c...
2 of 3 04/11/18, 12:29
code and notes to corroborate the results obtained
by your program, and may also call you to a faceto-face session to explain your code.
Deadline and Rules
Your submission is due by 11.55 p.m., Sunday,
November 11. You are advised to finish working on
your submission well in advance, keeping enough
time to test it on the sl2 machines and upload to
Moodle. Your submission will not be evaluated
(and will be given a score of zero) if it is not
received by the deadline.
Test your code on the sl2 machines even while you
are developing it: do not postpone this step to the
last minute. If your code requires any special
libraries to run, it is your responsibility to get those
libraries working on the sl2 machines (go through
the CSE bug tracking system to make a request to
the system administrators). Make sure that you
upload the intended version of your code to Moodle
(after uploading, download your submission and
test it on the sl2 machines to make sure it is the
correct version). You will not be allowed to alter
your code in any way after the submission deadline.
In short: your grade will be completely determined
by your submission on Moodle at the time of the
deadline. Play safe by having it uploaded and tested
at least a few hours in advance.
You must work alone on this assignment. Do not
share any code (whether yours or code you have
found on the Internet) with your classmates. Do not
discuss the design of your solution with anybody
else. Do not see anybody else's code or report,
either your colleagues' or from sites on the Internet
that discuss Windy Gridworld.
CS 747: Programming Assignment 4 https://www.cse.iitb.ac.in/~shivaram/teaching/c...
3 of 3 04/11/18, 12:29