$30
CS 747: Programming
Assignment 1
In this assignment, you will implement and
compare different algorithms for sampling the arms
of a stochastic multi-armed bandit. Each arm
provides i.i.d. rewards from a fixed distribution.
The objective is to minimise regret. The algorithms
you will implement are epsilon-greedy exploration,
UCB, KL-UCB, and Thompson Sampling.
This is a relatively straightforward assignment,
which essentially puts to practice the algorithms we
have discussed in class. The only departure from
the class discussion is in the reward distributions of
the arms. In class we assumed these distributions to
be Bernoulli: that is, only generating 0's and 1's as
rewards. In this assignment, the rewards can be
arbitrary real numbers that lie in the interval [0, 1].
This is not a very significant difference, but one that
you must handle in your algorithms anyway.
Code
You will find three directories in this code base.
The server directory comprises the code to
simulate a bandit, in other words the
"environment". The server waits for a client
("agent") to connect with it and start pulling
arms in sequence. For each pull, the server
generates a reward in [0, 1] based on the true
mean of the arm pulled (known to the server,
but unknown to the client), which is
communicated back to the agent. Each
communication from the server also contains
the total number of pulls performed so far.
Thus, each server message is of the form
CS 747: Programming Assignment 1 https://www.cse.iitb.ac.in/~shivaram/teaching/c...
1 of 6 09/08/18, 11:27
"reward,pulls". Agent-server communication
happens through a TCP connection. The
server is coded in C++, and compiles into a
binary called bandit-environment on running
make. It can be started with a shell script
called startserver.sh.
The data directory contains six bandit
instances, each with either 5 or 25 arms, and
each with the reward distributions of its arms
being bernoulli, histogram, or
betaDistribution (all arms in a bandit
instance have the same family of
distributions). The data provided in the
instance files, along with the code that
consumes them in the server, should make it
clear exactly how stochastic rewards are
generated in each of these cases. Do create
and experiment with other instances for
testing your algorithms.
The client directory is provided to you as an
example of what your submission must
achieve. The client is the agent that
implements sampling algorithms. The agent
provided to you merely samples the arms in a
round-robin fashion: you will have to
implement more efficient sampling
algorithms. Observe that the command line
parameters to the client include --algorithm
(which can take values epsilon-greedy, UCB,
KL-UCB, and Thompson-Sampling (all casesensitive)), and --epsilon (which will be a
real number in [0, 1]). The agent provided is
coded in C++: it compiles using make into a
binary called bandit-agent.
Run startexperiment.sh in the top level directory
to run an experiment. It will start the server and the
client in sequence. The server writes out a log file
based on its interaction with the client; the per-step
reward and regret are available from this log file.
CS 747: Programming Assignment 1 https://www.cse.iitb.ac.in/~shivaram/teaching/c...
2 of 6 09/08/18, 11:27
Run with different algorithms, random seeds, bandit
instances, and horizons to get familiar with the
space of experiments.
Submission
You will submit two items: (1) working code for
your agent, which implements different algorithms,
and (2) a report containing graphs of regret
measured at different horizons, as well as your
observations from the experiments.
You must submit a directory called client,
which contains all the source and executable
files of your agent. The directory must
contain a script named startclient.sh,
which must take in all the command line
arguments that are accepted by the example
client provided. You are free to build upon
the provided C++ agent, or otherwise
implement an agent in any programming
language of your choice. The hostname and
port number should suffice for setting up a
TCP connection with the server.
Your code will be tested by running an
experiment that calls startclient.sh in your
directory. Before you submit, make sure you
can successfully run startexperiment.sh on
the sl2 machines (sl2-X.cse.iitb.ac.in,
where X is the machine number, for example
88).
For each of the six bandit instances provided
in the data directory; each algorithm from
{epsilon-greedy, UCB, KL-UCB, ThompsonSampling}; each horizon in {1000, 10000},
generate at least 100 runs by varying the
random seed. Run at least 5 versions of
epsilon-greedy on each instance, with
different values of epsilon, in order to
demonstrate the effect of this parameter.
CS 747: Programming Assignment 1 https://www.cse.iitb.ac.in/~shivaram/teaching/c...
3 of 6 09/08/18, 11:27
Create a separate plot for each bandit
instance, with the number of pulls (up to the
horizon) on the x axis and the (expected
cumulative) regret on the y axis. Plot a curve
corresponding to each algorithm (thus, a total
of 8 curves). Each point must be the average
over all runs for the corresponding setting of
instance, horizon, and algorithm. epsilongreedy
Place your plots in a single pdf file called
report.pdf, and along with the plots, note
down your observations about the results
obtained. What patterns do they display? Do
your results validate your intuition about the
effect of epsilon? Did you notice anything
unexpected? Do you have any ideas to further
improve performance?. The report should
mention what changes you made (if any) to
account for non-Bernoulli rewards. It should
also include other relevant details about your
implementation of the algorithms (for
example, if you tuned any parameters).
Place all the data generated from your
experiments, as well as report.pdf, within a
directory called report, and place this
directory inside client.
If you have used any external code snippets or
libraries in your code, make sure you provide
references in your report. It is okay to use public
code for parts of your agent such as the network
communication module, or, say, libraries for
random number generation and sorting. However,
the logic used for sampling the arms must entirely
be code that you have written.
In summary: you must submit your client directory
(compressed as client.tar.gz) through Moodle.
The directory must contain startclient.sh, along
with all the sources and executables for the agent.
The client directory must also contain a directory
CS 747: Programming Assignment 1 https://www.cse.iitb.ac.in/~shivaram/teaching/c...
4 of 6 09/08/18, 11:27
called report, in which you provide all the data
generated by your experiments, and a file called
report.pdf.
Evaluation
Your algorithms will be tested on different bandit
instances and horizons to verify that they perform
as expected. Their performance on the two provided
bandit instances, as well as your accompanying
report, will carry 6 marks. The performance of the
algorithms on other unseen bandit instances will
carry 4 marks. For each such (algorithm, instance,
horizon) configuration, a large number of runs (say
100) will be conducted by varying the random seed
passed to the server and agent. The average regret
over these runs will be recorded. The number of
bandit arms will be between 2 and 50; the horizon
will be between 1 and 100,000.
The TAs and instructor may look at your source
code and notes to corroborate the results obtained
by your agent, and may also call you to a face-toface session to explain your code.
Deadline and Rules
Your submission is due by 11.55 p.m., Sunday,
August 19. Finish working on your submission well
in advance, keeping enough time to upload it to
Moodle. Your submission will not be evaluated
(and will be given a score of zero) if it is not
uploaded to Moodle by the deadline.
Before submission, make sure that your code runs
for a variety of experimental conditions. Test your
code on the sl2 machines even while you are
developing it: do not postpone this step to the last
minute. If your code requires any special libraries
to run, it is your responsibility to get those libraries
working on the sl2 machines (go through the CSE
bug tracking system to make a request to the system
CS 747: Programming Assignment 1 https://www.cse.iitb.ac.in/~shivaram/teaching/c...
5 of 6 09/08/18, 11:27
administrators). Make sure that you upload the
intended version of your code to Moodle (after
uploading, download your submission and test it on
the sl2 machines to make sure it is the correct
version). You will not be allowed to alter your code
in any way after the submission deadline. In short:
your grade will be completely determined by your
submission on Moodle at the time of the deadline.
Play safe by having it uploaded and tested at least a
few hours in advance.
You must work alone on this assignment. Do not
share any code (whether yours or code you have
found on the Internet) with your classmates. Do not
discuss the design of your agent with anybody else.
Do not see anybody else's code or report.
CS 747: Programming Assignment 1 https://www.cse.iitb.ac.in/~shivaram/teaching/c...
6 of 6 09/08/18, 11:27