$30
APM 598: Homework 2
1 Two-layers neural networks
Ex 1.
We consider two-layers neural networks of the form (see fig. 1):
f(x) = b2 + W2
σ(b1 + W1 · x)
, (1)
where x, b1, b2 ∈ R
2 and W1, W2 ∈ M2×2(R) are matrices (2 × 2). The activation
function σ is the ReLu function (i.e. σ(x) = max(x, 0)). We denote by s = f(x) the
score predicted by the model with s = (s1, s2) where s1 is the score for class 1 and s2 the
score for class 2.
ReLu
Figure 1: Illustration of a two-layer neural network using ReLu activation function.
a) Consider the points given in figure 2-left where each color correspond to a different
class:
class 1: x1 = (1, 0) and x2 = (−1, 0),
class 2: x3 = (0, 1) and x4 = (0, −1).
Find (numerically or analytically) some parameters b1, b2, W1 and W2 such that
the scores s satisfy:
s1 > s2 for x1 and x2 , s1 < s2 for x3 and x4.
1
−1.0 −0.5 0.0 0.5 1.0
−1.0
−0.5
0.0
0.5
1.0
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
−2.0
−1.5
−1.0
−0.5
0.0
0.5
1.0
1.5
Figure 2: Data points to classify.
b) Consider now the data-set given in figure 2-right (see code below to load the data).
Train a two-layer neural network of the form (1) to classify the points. Provide the
accuracy of the model (percentage of correctly predicted labels).
##################################
## Exercise 1 ##
##################################
import numpy as np
import pandas as pd
df = pd.read_csv('data_HW2_ex1.csv')
X = np.column_stack((df['x1'].values,df['x2'].values))
y = df['class'].values
Ex 2.
The goal of this exercise is to show that two-layers neural networks with ReLu activation can approximate any continuous functions. To simplify, we restrict our attention
to the one-dimensional case and fix a continuous function:
g : [0, 1] −→ R.
We claim that for any ε > 0, there exists fθ two-layers neural network such that:
max
x∈[0,1]
|g(x) − fθ(x)| < ε. (2)
The key idea is to show that fθ can interpolate (exactly) g at as many points as needed
(see figure 3-right).
2
We consider neural networks fθ of the form:
fθ(x) = W(2)
σ(W(1)x + b
(1))
+ b
(2) =
Xm
k=1
w
(2)
k σ(w
(1)
k x + b
(1)
k
) + b
(2) (3)
where m is the size of hidden layer, the unknown parameters θ are the two weight matrices
W(1) = {w
(1)
k
}k=1:m, W(2) = {w
(2)
k
}k=1:m and the two bias b
(1) = {b
(1)
k
}k=1:m and b
(2). The
activation function σ is taken as the ReLu function. The hidden layer is intended to have
a large dimension, i.e. the intermediate value z ∈ R
m with m ≫ 1 (see figure 3-left).
a) Consider three points: (x0, y0), (x1, y1), (x2, y2) with x0 = 0, x1 =
1
2
, x2 = 1 (the
values yi are arbitrary). Find fθ such that fθ(xi) = yi
for i = 0, 1, 2.
Hint. Use m = 2 with the functions σ(x − x0) and σ(x − x1)
b) Generalize: write a program that take as inputs {(xi
, yi)}0≤i≤N with xi < xi+1 and
return a two layers n.n. such that fθ(xi) = yi
for all i = 0 . . . N.
Hint. Use m = N and the functions σ(x − xi).
Extra) Prove (2).
Hint: use that g is uniformly continuous on [0, 1].
ReLu
x y
1
Figure 3: Left: two layers neural network used to approximate continuous function. The
hidden layer (i.e. z = (z1, . . . , zm)) is in general quite large. Right: to approximate
the continuous function g, we interpolate some of its values (xi
, yi) by a piece-wise linear
function using the functions σ(x − xi).
3
2 Convolutional Neural Networks
Ex 3.
Using convolutional layers, max pooling and ReLu activation functions, build a classifier for the Fashion-MNIST database (see a sketch example in figure 4) with a neural
network with at most 500 parameters.
Provide the evolution of the loss for the training and test sets. Give the accuracy on
both sets after the training.
The three groups with the highest accuracy get additional points.
input (image) output (score)
pooling flatten
+
fully
connected
conv
ReLu
+
conv
ReLu
+
channels
Figure 4: Schematic representation of a neural network for image classification.
4