HW10: Reinforcement Learning

Starting from:

$29.99

HW10: Reinforcement Learning
1 Getting started
Download the starter code from canvas. It consists of two files: Q-Learning.py and tests.py. You can
create and activate your virtual environment with the following commands:
python3 -m venv /path/to/new/virtual/environment
source /path/to/new/virtual/environment/bin/activate
Once you have sourced your environment, you can run the following commands to install the necessary
dependencies:
pip install --upgrade pip
pip install torch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0
pip install gym==0.23.1 pygame==2.1.2
You should now have a virtual environment which is fully compatible with the skeleton code. You should
set up this virtual environment on an instructional machine to do your final testing.
2 Q-Learning
For the Q-learning portion of HW10, we will be using the environment FrozenLake-v1 from OpenAI
gym. This is a discrete environment where the agent can move in the cardinal directions, but is not
guaranteed to move in the direction it chooses. The agent gets a reward of 1 when it reaches the tile
marked G, and a reward of 0 in all other settings. You can read more about FrozenLake-v1 (it is the
same as FrozenLake-v0) here: https://www.gymlibrary.dev/environments/toy_text/
frozen_lake/. You will not need to change any code outside of the area marked TODO, but you
are free to change the hyper-parameters if you want to. For each sampled tuple (s, a, r, s0
, done), the
update rule for Q-learning is:
Q(s, a) = (
(1 − α)Q(s, a) + α(r + γ maxa0∈A Q(s
0
, a0
)) if !done
(1 − α)Q(s, a) + αr if done
The agent should act according to an epsilon-greedy policy as defined in the Reinforcement Learning
1 slides. In this equation, α is the learning rate hyper-parameter, and γ is the discount factor hyperparameter.
• HINT: tests.py is worth looking at to gain an understanding of how to use the OpenAI gym env.
• Files to Submit: For this section, you should submit the files Q learning.py and Q TABLE.pkl.
1
3 OpenAI gym Environment
You will need to use several OpenAI gym functions in order to operate your gym environment for reinforcement learning. As stated in a previous hint, tests.py has a lot of the function calls you need. Several
important functions are as follows:
env.step(action)
Given that the environment is in state s, step takes an integer specifiying the chosen action, and returns a
tuple of the form (s, r, done, info). ’done’ specifies whether or not s
0
is the final state for that particular
episode, and ’info’ is unused in this assignment.
env.reset()
Resets the environment to it’s initial state, and returns that state.
env.action space.sample()
Samples an integer corresponding to a random choice of action in the environment’s action space.
env.action space.n
In the setting of the environments we will be working with for these assignments, this is an integer corresponding to the number of possible actions in the environment’s action space.
You can read more about OpenAI gym here: https://www.gymlibrary.dev.
4 Submission Format
You can test your learned policies, by calling python3 ./tests.py. Make sure to test your saved Q-tables
using tests.py on the instructional machines with a virtual environment set up as specified above. This
is the same program, which we will be using to test your Q-tables and Q-network, so you will have a
good idea about how many points you will receive for the automated tests portion of the grade. Your
submission should contain the following files:
Required: Q learning.py, Q TABLE.pkl
Please submit these files in a zipped folder title <yournetid>.zip , where ’yournetid’ is your net ID.
Please make sure that there is not a folder inside the zipped folder, and that the submitted files are at the
top level of the zipped folder.
The assignment is due Dec 13 at 23:59 central time. We are not accepting late submissions for this
assignment. Regrades will only be accepted if they are due to an error in tests.py.
2

More products

Assignment 2 -Operating Systems

$29

Add to cart

Assignment 1: Operating Systems

$29

Add to cart

Project 2: A PThread-based Network Driver

$30

Add to cart