Starting from:

$30

Assignment 3: Non-linear regression

Assignment 3: Non-linear regression
CS480/680 
Submit an electronic copy of your assignment via LEARN. Late submissions incur a 2% penalty for every
rounded up hour past the deadline. For example, an assignment submitted 5 hours and 15 min late will receive
a penalty of ceiling(5.25) * 2% = 12%.
Be sure to include your name and student number with your assignment.
1. [20 pts] Show that the Gaussian kernel k(x, x
0
) = exp(−||x−x
0
||2/2σ
2
) can be expressed as the inner product
of an infinite-dimensional feature space. Hint: use the following expansion and show that the middle factor
further expands as a power series:
k(x, x
0
) = e
−x
T x/2σ
2
e
x
T x
0/σ2
e
−(x
0
)
T x
0/2σ
2
2. [80 pts] Non-linear regression techniques.
Implement the following regression algorithms. For a), b) and c), do not use any machine learning library,
but feel free to use libraries for linear algebra and feel free to verify your results with existing machine learning
libraries. For d) feel free to use a machine learning package such as Keras, TensorFlow or PyTorch to implement
your neural network. Use the dataset posted on the course web page. The input and output spaces are continuous
(i.e., x ∈ <d
and y ∈ <).
(a) [20 pts] Regularized generalized linear regression: perform least square regression with the penalty term
0.5w
T w. Use monomial basis functions up to degree d: {
Q
i
(xi)
ni
|
P
i
ni ≤ d}. A monomial of degree
less than or equal to d is a product of variables ( e.g., Q
i
(xi)
ni
) where the sum of their exponents is less
than or equal to d (e.g., P
i
ni ≤ d).
(b) [20 pts] Bayesian generalized linear regression: use monomial basis function up to degree d as described
above. Assume the output noise is Gaussian with variance = 1. Start with a Gaussian prior over the weights
Pr(w) = N(0, I) with 0 mean and identity covariance matrix.
(c) [20 pts] Gaussian process regression: assume the output noise is Gaussian with variance = 1. Use the
following kernels:
• Identity: k(x, x0
) = x
T x
0
• Gaussian: k(x, x0
) = e
−||x−x
0
||2/2σ
2
• Polynomial: k(x, x0
) = (x
T x
0 + 1)d where d is the degree of the polynomial
(d) [20 pts] Neural network: minimize the squared loss of a two-layer neural network with a sigmoid activation
function for the hidden nodes and the identity function for the output node.
What to hand in:
• Your code for each algorithm.
1
• Regularized generalized linear regression:
– Graph that shows the mean squared error based on 10-fold cross validation for degrees 1, 2, 3 and 4
of the monomial basis functions.
– The best degree found by 10-fold cross validation and the squared error for the test set.
– How does the running time vary with the degree of the monomial basis functions?
• Bayesian generalized linear regression:
– Graph that shows the mean squared error based on 10-fold cross validation for degrees 1, 2, 3 and 4
of the monomial basis functions.
– The best degree found by 10-fold cross validation and the squared error for the test set.
– How does the running time vary with the degree of the monomial basis functions?
– What are the similarities and differences between regularized generalized linear regression and Bayesian
generalized linear regression.
• Gaussian process regression:
– The mean squared error of the test set for the identity kernel.
– Graph that shows the mean squared error based on 10-fold cross validation for the Gaussian kernel
when we vary σ from 1 to 6 in increments of 1. The mean squared error of the test set for the best σ.
– Graph that shows the mean squared error based on 10-fold cross validation for degrees 1, 2, 3 and 4
of the polynomial kernel. The mean squared error of the test set for the best polynomial degree.
– How does the running time vary?
• Neural network:
– Graph that shows the mean squared error based on 10-fold cross validation as we vary the number of
hidden units from 1 to 10 (in increments of 1).
– The best number of hidden units found by 10-fold cross validation and the squared error for the test
set.
– How does the running time vary with the number of hidden units?
2

More products