$30
Machine Learning
COMP 5630/ COMP 6630/ COMP 6630
Assignment #5
Linear Regression and Regularization
Submission Instructions
Please submit your solutions
via Canvas (https://auburn.instructure.com/). You should submit your assignment as a
PDF file. Please do not include blurry scanned/photographed equations as they are difficult
for us to grade.
Late Submission Policy
The late submission policy for assignments will be as follows unless otherwise specified:
1. 75% credit within 0-48 hours after the submission deadline.
2. 50% credit within 48-96 hours after the submission deadline.
3. 0% credit beyond 96 hours after the submission deadline.
Tasks
1 Linear Regression [30 pts]
Suppose that, y = w0 + w1x1 + w2x2 + ϵ, where ϵ ∼ N (0, σ2
)
1
a) [10 pts] Write down an expression for P(y|x1, x2).
b) [10 pts] Assume you are given a set of training observations
x
(i)
1
, x
(i)
2
, y(i)
for i =
1, ..., n. Write down the conditional log-likelihood of this training data. Drop any
constants that do not depend on the parameters w0, w1, or w2.
c) [10 pts] Based on your answer, show that finding the MLE of that conditional loglikelihood is equivalent to minimizing least squares,
1
2
Xn
i=1
n
y
(i) −
w0 + w1x
(i)
1 + w2x
(i)
2
o2
2 Regularization [30 pts]
a) [15 pts] Find the partial derivative of the regularized least squares problem:
1
2
Xn
i=1
n
y
(i) −
w0 + w1x
(i)
1 + w2x
(i)
2
o2
+
λ
2
∥[w1, w2]∥
2
2
with respect to w0, w1, and w2. Although there is a closed-form solution to this
problem, there are situations in practice where we solve this via gradient descent.
Write down the gradient descent update rules for w0, w1, and w2.
b) [15 pts] Suppose that w1, w2 ∼ N (0, τ 2
). Prove that, the MAP estimate of w0, w1,
and w2 with this prior is equivalent to minimizing the above regularized least squares
problem with λ =
σ
2
τ
2
.
3 Programming: Implement Linear Regression [40 pts]
In this assignment, you will implement Linear Regression for a small dataset. Note that you
will implement Linear Regression from scratch, and no external libraries like Tensorflow,
Sklearn, or Pytorch can be used. The data set (“numpydataset.csv”) has been provided
on canvas as part of this assignment. We have also provided a template code (“Linear
Regression.py”) to simplify the input, output, and plot functions.
You will complete the following tasks for this programming assignment:
a) [10 pts] Implement a method MSE() that will compute the loss incurred at each
epoch. We aim to minimize the distance between the data and the line after each
epoch. An epoch is a single iteration over a dataset of features. The higher the cost
(or Error), the more the parameter values need to be changed in order to bring it
down. The Error function is represented as E =
1
n
Pn
i=0(yi − yˆ)
2
, where yi
is the
actual value, ˆy is the predicted value, n is the total number of data points or samples.
Invoke the function to calculate the initial loss when m = 0 and b = 0. We are using
m and w interchangeably because they both refer to the weight or the slope.
2
b) [20 pts] Your aim in Linear Regression is to minimize the MSE by using the GRADIENT DESCENT Algorithm. Extensively used in Machine Learning, this process
involves finding the local minimum of a function. The sole idea behind this algorithm
is, given a certain learning rate γ, to take iterative steps (slowly) in the direction
of −∇E (negative gradient) in order to minimize the error rate. Each iteration of
gradient descent updates the w and the b (collectively denoted by θ) according to
θ
t+1 = θ
t − γ
∂E(X, θ)
∂θ
where θ
t denotes the weights and the bias at iteration t.
c) [10 pts] Using the template code, compute the MSE after 100 iterations for learning
rate 0.001 and 0.01 separately. The code will also create two plots (corresponding to
different learning rates) for the fitted line along with the data points for you. Which
learning rate do you think is working better? Why? Explain your answer with relevant
data and observations.