Machine Learning (B555) Programming Project 2

Your shopping cart is empty.

Machine Learning (B555)
Programming Project 2
Programming language and libraries: You can write your code in any programming language so long as we are able to test it on SICE servers (python should be ok; please ask about
other options). We plan to run some or all or submitted code for further testing and validation.
You may use standard I/O, math, and plotting libraries (including numpy, and matplotlib). However, other than these, please write all the code yourself without referring to special libraries or
modules, i.e., no scikit, no pandas, no other data processing libraries etc.
Overview: Experiments with Bayesian Linear Regression
Our goal in this assignment is to evaluate linear regression, its regularized variant and the Bayesian
model including model selection. In all your experiments you should report the performance in
terms of the mean square error
MSE = 1
N
X
i
(φ(xi)
T w − ti)
2
where the number of examples in the corresponding dataset is N.
Data
Data for this assignment is provided in a zip file pp2data.zip on Canvas.
Each dataset comes in 4 files with the training set in train-name.csv the corresponding labels
(regression values) in trainR-name.csv and similarly for test set. For Task 1 we use the crime
and housing datasets. For task 2 the files are named f3 and f5; these datasets have only one
feature and the label was generated from polynomial regression, using polynomials of degree 3
and 5 respectively. Note that the train/test splits are fixed and we will not change them in the
assignment.
1
Algorithms
In this assignment you should implement two machine learning algorithms. The first algorithm is
regularized linear regression, i.e., given a dataset, the solution vector w is given by equation (3.28)
of [B]. Note that plugging in λ = 0 we get the maximum likelihood solution so the same code can be
used for this case as well. We can then calculate the MSE on the test set using the w for prediction.
The second algorithm is the formulation of Bayesian linear regression with the simple prior w ∼
N (0,
1
α
I). Recall that the evidence function (and evidence approximation) gives a method to pick
the parameters α and β. Referring to [B], the solution is given in equations (3.91), (3.92), (3.95),
where mN and SN are given in (3.53) and (3.54). These yield an iterative algorithm for selecting
α and β using the training set. This scheme is pretty stable and converges in a reasonable number
of iterations. You can initialize α, β to random values in the range [1, 10]. We can then calculate
the MSE on the test set using the MAP (mN ) for prediction. This is the same as the prediction of
the regularized algorithm with λ = α/β.
Task 1: Comparing the Bayesian algorithm to Linear Regression
with and without Regularization
To evaluate the algorithms we will generate learning curves. In particular for training fractions
f ∈ {0.1, 0.2, 0.3, . . . , 1.0}, train the learning algorithm using the initial fraction of the dataset of
that size, and calculate MSE on the test set. Each of the following should be done for both datasets.
(i) Run the model selection algorithm and report the values of α, β and effective λ for each train
size.
(ii) Run the maximum likelihood algorithm and the model selection algorithm. Plot their test set
MSE as a function of training set size to compare their performance. Please limit the y-axis in
plots to the range [0,1] to ensure visibility of differences (you can crop MSE values or use plotting
library limits). What can you observe w.r.t. their relative performance? Try to explain why the
results are as observed and whether they are as expected or not.
(iii) Repeat part (ii) with values of λ equal to 1.0, 33.0, 100.0, 1000.0 and discuss the results. Can
we use a single universal value for λ for different datasets? Is the Bayesian algorithm successful in
selecting a good value? How could one select it otherwise?
Note: typically, learning curve experiments are repeated multiple times with randomized data to
observe standard deviations in differences. In this assignment we skipped this to reduce the amount
of work.
Task 2: Bayesian Model Selection for Parameters and Model Order
In this part we work with the datasets f3 and f5 whose labels were generated using polynomials.
You should run the Bayesian model selection scheme of the previous task using polynomial degrees
d in {1, 2, . . . , 10}. The files themselves only include the x values, so in order to run the regression
code you must first generate appropriate training data. For example, for degree 3, each x in the
training and test files is replaced with 1, x, x2
, x3
.
2
For each degree d, run the Bayesian Model Selection code to select α, β (and hence λ) and calculate
the log evidence (given in eq (3.86)) on the training set. Then calculate the MSE on the test set
using the MAP (mN ) for prediction. In addition, run non-regularized linear regression on the same
data and calculate the MSE on the test set.
For each dataset plot the log evidence and 2 MSE values (of non-regularized and Bayesian models)
as a function of d. Can the evidence be used to successfully select α, β and d for the Bayesian
method? How does the non-regularized model fare in these runs?
Note: evidence is only relevant for the Bayesian method and one would need some other method
to select d using maximum likelihood in this model.
Submission
Please submit two separate items via Canvas:
(1) A zip file pp2.zip with all your work and the report. The zip file should include: (1a) Please
write a report on the experiments, include all plots and results, and your conclusions as requested
above. Prepare a PDF file with this report. (1b) Your code for the assignment, including a
README file that explains how to run it. When run your code should produce all the results and
plots as requested above. Your code should assume that the data files will have names as specified
above and will reside in sub-directory pp1data/ of the directory where the code is executed. We
will read your code as part of the grading – please make sure the code is well structured and easy
to follow (i.e., document it as needed). This portion can be a single file or multiple files.
(2) One PDF “printout” of all contents in 1a,1b: call this YourName-pp2-everything.pdf. One
PDF file which includes the report, a printout of the code and the README file. We will use
this file as a primary point for reading your submission and providing feedback so please include
anything pertinent here.
Grading
Your assignment will be graded based on (1) the clarity of the code, (2) its correctness, (3) the
presentation and discussion of the results, (4) The README file and our ability to follow the
instructions and test the code.
3

Shopping cart

US$0

Machine Learning (B555) Programming Project 2

More products