$29
APM 598: Homework 1
Ex 1. [Overfitting]
A dataset {xi
, yi}i=1...N has been generated from an unknown polynomial P∗(x) (see
figure 1 and code page 3 to load the data ’data_HW1_ex1.csv’ in Python), i.e.
yi = P∗(xi) + εi
where εi
is some noise’. The goal is to estimate the polynomial P∗ and in its degree k∗
(i.e. k∗ = deg(P∗)).
Figure 1: Data points {xi
, yi}i=1..N generated from a polynomial P∗.
a) For each k = 0 . . . 12, estimate the polynomial Pˆ
k of order k that minimizes the
mean-square-error (use polyfit):
Pˆ
k = argmin
P, deg(P)≤k
ℓ(P) with ℓ(P) = 1
N
X
N
i=1
|yi − P(xi)|
2
.
Plot the loss ℓ(Pˆ
k) as a function of k (i.e. plot the points {k, ℓ(Pˆ
k)}k=0...12).
b) Split the data-set {xi
, yi}i=1...N into training and test sets using (respectively) 80%
and 20% of the dataset. We define two loss functions:
ℓtrain(P) = 1
Ntrain
X
i in train set
|yi − P(xi)|
2
(1)
ℓtest(P) = 1
Ntest
X
i in test set
|yi − P(xi)|
2
(2)
1
where Ntrain and Ntext denote the number of elements in (respectively) the train
and test sets (Ntrain + Ntext = N).
For each k = 0 . . . 12, estimate the polynomial ePk of order k that minimizes the
mean-square-error over the training set:
ePk = argmin
P, deg(P)≤k
ℓtrain(P).
Plot the values of both loss functions ℓtrain and ℓtest at ePk (i.e. plot the points
{k, ℓtrain(Pˆ
k)}k=0...12 and {k, ℓtest(Pˆ
k)}k=0...12).
c) Use ℓtest to guess the order k∗ of the unknown polynomial P∗ and give an estimate
of its coefficients.
Ex 2. [Gradient descent]
The goal is to implement and test gradient descent methods. We use linear regression
as a toy problem using the data set {xi
, yi}i=1..N from Ex 1 and we would like to minimize
the following loss function:
ℓ(a, b) = 1
N
X
N
i=1
|yi − (a + bxi)|
2
.
a) Compute the gradient of the loss function ∇ℓ = (∂aℓ, ∂bℓ).
Deduce (numerically) the minimum (a∗, b∗) of ℓ.
b) Starting from (a0, b0) = (1.8, 1.4) and using the constant learning rate η = .05,
implement the gradient descent algorithm:
an+1
bn+1 !
=
an
bn
!
− η∇ℓ(an, bn).
Estimate the rate of convergence (i.e. how fast does ∥(an, bn) − (a∗, b∗)∥ converge
to zero?).
c) Implement the momentum and Nesterov algorithm, denoting θn = (an, bn),
momentum (
vn+1 = γvn + η∇ℓ(θn)
θn+1 = θn − vn+1
Nesterov (
vn+1 = γvn + η∇ℓ(θn − γvn)
θn+1 = θn − vn+1
Estimate the convergence rate for γ = .9 and η = .05.
extra) Test other gradient descent methods (e.g. Adam, AdaGrad, RMSProp) using class
defined in either Pytorch or TensorFlow.
2
Ex 3. [Classification]
The goal is to implement a linear classifier for the data-set MNIST-fashion (see also
code page 3).
a) Adapt the linear classifier for the MNIST data-set to the new data-set.
b) Train the model using 40 epochs with learning rate η = 10−4 and the gradient
descent method of your choices.
Plot the evolution of the loss at each epoch.
c) Draw the ’template’ of each class.
##############################################
## Exercise 1 (Python) ##
##############################################
import matplotlib.pyplot as plt
import numpy as np
# get the data
data = np.loadtxt('data_HW1_ex1.csv',delimiter=',')
x,y = data[:,0],data[:,1]
# plot
plt.figure(1)
plt.plot(x,y,'o')
plt.xlabel(r'$x$')
plt.ylabel(r'$y$')
plt.show()
##############################################
## Exercise 3 (Python) ##
##############################################
from torchvision import datasets
import matplotlib.pyplot as plt
# Download and load
data_collection = datasets.FashionMNIST('data_fashionMNIST', train=True, download=True)
# Illustration
label_fashion = dict([(0,'T-shirt'),(1,'trouser'),(2,'pullover'),(3,'dress'),(4,'coat'),
(5,'sandal'),(6,'shirt'),(7,'sneaker'),(8,'bag'),(9,'boot')])
X,y = data_collection.__getitem__(123)
plt.figure(1);plt.clf()
plt.imshow(X)
plt.title("Example of image with label "+label_fashion[y.item()])
plt.show()
3