$30
Assignment 1
CPSC/AMTH/CBB 663
Please include all of your written answers and figures in a single PDF,
titled <lastname and initals>_assignmentl.pdf. Put this and all other
relevant files (most notably, your code) into a folder called <lastname
and initials>_assignment 1 and then zip this folder into a single zip
file. If all goes according to plan, this file should be called <lastname
and initials>_assignmentl .zip (e.g. for a student named Tom Marvolo
Riddle, riddletm_assignmentl.zip). Be sure to rename the folder before
zipping it, lest it revert to its previous name when uncompressed.
We’ve provided skeleton code for each major function you’ll write in
a file called psl_functions.py. Please fill in these functions (preserving their names, arguments, and outputs) and include your completed
psl-functions.py in your assignment zip file (this is needed by our grading
scripts). Any supplemental code you write (e.g. calling these functions to
generate plots, or trying out different parameters) can be handled however
you choose. A well-structured Jupyter notebook with neatly produced and
labelled figures is an excellent way to compile assignment reports; just be
sure to submit a PDF and the separate psi-functions.py file alongside
the notebook. However you produce your report, ensure all your figures are
clearly labelled.
Programming assignments should use built-in functions in Python
and PyTorch. In general, you may use the scipy stack [1]; however,
exercises are designed to emphasize the nuances of machine learning and
deep learning algorithms - if a function exists that trivially solves an entire
problem, please consult with the TA before using it.
Problem 1
What are the characteristics of a machine learning algorithm and what is meant by “learning” from data?
Problem 2
Least Squares Solution: w* (XTX)~1XTy.
Problem 3
2
1. Load the dataset from file assignmentl.zip and normalize the features using min-max scaling so that
each feature has the same range of values.
2. Find the optimal weights (in terms of MSE) for fitting a polynomial function to the data in all 6 cases
generated above using a polynomial of degree 1, 2, and 9. Use the least squares analytical solution
given above. Do not use built-in methods for regression. Plot the fitted curves on the same plot as the
data points (you can plot all 3 polynomial curves on the same plot). Report the fitted weights and the
MSE in tables. Qualitatively assess the fit of the curves. Does it look like any of the models overfit,
underfit, or appropriately fit the data? Explain your reasoning in one to two sentences (no calculations
necessary).
L2 Norm: |rr||2 =
2. Write a program that applies a fc-nn classifier to the data with k € {1,5,10,15}. Calculate the test
error using both leave-one-out validation and 5-fold cross validation. Plot the test error as a function
of k. You may use the existing methods in scikit-learn or other libraries for finding the fc-nearest
neighbors, but do not use any built-in fc-nn classifiers. Any reasonable handling of ties in finding
fc-nearest neighbors is okay. Also, do not use any existing libraries or methods for cross validation. Do
any values of k result in underfitting or overfitting?
ΣΝ2
,i=l
1. Write code in Python that randomly generates N points sampled uniformly in the interval x & [—1,3].
Then output the function y = x2 — 3rr + 1 for each of the points generated. Then write code that adds
zero-mean Gaussian noise with standard deviation σ to y. Make plots of x and y with N e {15,100}
and σ € {0,0.05,0.2} (there should be six plots in total). Save the point sets for following questions.
Hint: You may want to check the NumPy library for generating noise.
3. Apply L2 norm regularization with a 9-degree polynomial model to the cases with σ — 0.05 and
N € {15,100}. Vary the parameter λ, and choose three values of A that result in the following
scenarios: underfitting, overfitting, and an appropriate fit. Report the fitted weights and the MSE in
each of these scenarios. Hint: The least squares solution can also be used for polynomial regression.
Check slides of lecture 2 for details on L2 norm regularization.
Problem 4
Wi =0.6
-0.4
w7 = 1
Output -0.5
w, w8 = 1
-0.5
w6 =0.8
Figure 1: Multilayer perceptron with three inputs and one hidden layer. Numbers in circles are biases.
3
5. Using perceptrons with appropriate weights and biases, design an adder that does two-bit binary
addition. That is, the adder takes as input two two-bit binary numbers (i.e. 4 binary inputs) and adds
1. Suppose we take all the weights and biases in a network of perceptrons, and multiply them by a positive
constant, c > 0. Show that the behavior of the network doesn’t change. (Exercise in Chi Nielsen book)
4. If we change the perceptrons in Figure 1 to sigmoid neurons what are the outputs for the same inputs
(e.g., inputs of [0,0,0], [0,0,1], ...)?
*2
Xl
X3
/ VV4 “0.4
-0.6
W2\=-0.7
\w3. For each possible input of the MLP in Figure 1, calculate the output. I.e., what is the output if
X — [0,0,0], X = [0,0,1], etc. You should have 8 cases total.
3. Apply two other classifiers of your choice to the same data. For these additional classifiers, you may
use existing libraries, such as scikit-learn classifiers, but for cross-validation, you should reuse your
method from 3.2 or modify it slightly. Possible algorithms include (but are not limited to) logistic
regression, QDA, naive Bayes, SVM, and decision trees. Use 5-fold cross validation to calculate the
test error. Report the training and test errors. If any tuning parameters need to be selected, use crossvalidation and report the training and test error for several values of the tuning parameters. Which of
the classifiers performed best? Did any of them underfit or overfit the data? How do they compare to
the fc-nn classifiers in terms of performance?
2. Given the same setup of problem 4.1 - a network of perceptrons - suppose that the overall input to
the network of perceptrons has been chosen and fixed. Suppose the weights and biases are such that
wx+b / 0 for the input x to any particular perceptron in the network. Now replace all the perceptrons
in the network by sigmoid neurons, and multiply the weights and biases by a positive constant c > 0.
Show that in the limit as c —> 00 the behavior of this network ofsigmoid neurons is exactly the same as
the network of perceptrons. How can this fail when wx + b — 0 for one of the perceptrons? (Exercise
in Chi Nielsen book)
Problem 5
Here are the experiments:
• Experiment with the optimizer and activation function of your network.
4
3. Print a confusion matrix showing which digits were misclassified, and what they were misclassified as.
What numbers are frequently confused with one another by your model? (You may use sklearn’s
confusion matrix function to generate the matrix.)
1. The time has come — to implement your first fully-connected neural network in PyTorch! For this
assignment, we’ll be training the network on the canonical MNIST dataset. After building the network,
we’ll experiment with an array of hyperparameters, tweaking the network’s width, depth, learning rate
and more in pursuit of the highest classification accuracy we can muster. You might also choose to
match wits with your classmates, by vying to get your network on the class leaderboard of MNIST
scores: https://piazza.com/class/kyoikimyzbz6xj ?cid=8
them together. Don’t forget to include the carry bit. The resulting output should be the two-bit sum
and the carry bit for a total of three binary outputs.
You may find the Pytorch tutorials helpful as you complete this problem: https://pytorch.org/
tutorials/beginner/basics/intro .html. If you haven’t yet, we suggest you go through them —
especially the tutorial on the optimization loop, which you will need to build more or less from scratch.
• Follow the TODOs in FCNN.py to build a two-layer fully-connected neural network. We’ve provided code to handle the dataset and model initiation, but you need to supply the training logic.
• Try adjusting the learning rate (by making it smaller) if your model is not converging/improving
in accuracy. You might also try increasing the number of epochs used.
• Try changing the width of the hidden layer, keeping the activation function that performs best.
Remember to add these results to your table.
• Experiment with the non-linearity used before the middle layer. Here are some activation functions
to choose from: relu, softplus, elu, tanh.
• Lastly, try adding additional layers to your network. How do 3, 4, and 5 layer networks perform?
Is there a point where accuracy stops increasing?
• Try training your network without a non-linearity between the layers (i.e. a “linear activation”).
Then try adding a sigmoid non-linearity, first directly on the input to the first layer, then on the
input to the second layer. You should experiment with these independently and in combination.
Record your test results for each in a table
2. Create a plot of the training and test error vs the number of iterations. How many iterations are
sufficient to reach good performance?
References
scipy.org/stackspec.html
5
4. What was the highest percentage of classification accuracy your fully-connected network achieved?
Briefly describe the architecture and training process that produced it. (If you like, you can take
part in our friendly class competition by posting your results, along with a short description of your
methods, to https://piazza.com/class/kyoikimyzbz6xj?cid=8.)
[1] “The scipy stack specification^.” [Online]. Available: https://www.