Starting from:

$30

EE 660 Homework Week 2 

EE 660 Homework Week 2 
1. This problem may be solved using MATLAB or Python; the functions/commands
stated below are for MATLAB implementations.
You are to implement a simple curve fitting problem using 1D regression. In this
problem you are to code the assigned portions of the regressions yourself; using a
package’s regression or curve-fit function will not suffice.
(a) Model the curve to be fit, , as a dth order polynomial. Write down the
mean- squared error objective function for curve fitting, in terms of , and
, in which i is the data point index ( ), and m is the weight
index ( ).
(b) Write the objective function in matrix form in terms of and . ( is
the basis-set expansion version of ; the i
th row is .)
(c) Download the provided data from the dropbox and plot only the points of
x_train vs. y_train (use the command scatter(x, y)).
(d) Let the hypothesis set be polynomials in x of degree [1, 2, 3, 7, 10]. Find the
curve parameters (using only data from x_train and y_train) for each
of these polynomial degrees, using pseudo-inverse. (You can use commands
hold on and plot(x, y) to visualize how well the curve fits to the
training data, but this is not mandatory.) Show the computed weight vectors
, where denotes the weight vector for the dth order
polynomial.
Hint: to set this up as a pseudo-inverse problem, use the basis function
expansion of part (b) above.
(e) Compute the mean squared error (MSE) on the training set for each one, i.e.,
.
Plot error vs. polynomial degree. Which polynomial degree seems to be the
best model based on the training sample MSE only?
(f) Using the same weights ( , ), compute the MSE for the test
samples, i.e., using x_test and y_test. Plot error vs. polynomial degree
again. Which polynomial degree seems to be the best model based on the test
sample MSE only?
(g) Now, let’s fix the polynomial degree to 7. Solve using ridge regression with
penalty term λ = [10−5, 10−3, 10−1, 1, 10]. Show the computed weights.
(h) Compute train and test MSE of the fit from part (g) and plot both vs .
What are your conclusions?
ˆ
f ( x)
xi
, yi
wm i = 1,2,!,N
m = 0,1,!,d
Φ, y, w Φ
X φ
T
x( i)
w1,w2 ,w3 w7 ,w10 wd
MSEd = 1
N yi − wd
T
φ x( i) ⎡
⎣ ⎤
⎦ i=1
N

2
w1,w2 ,w3 w7 ,w10
log(λ )
2. Murphy Exercise 7.4. Hint: Start from Murphy Eq. (7.8), and assume is given.
Problems 4-5 below involve reading and related short exercises, for upcoming lectures.
3. Bayesian concept learning. Read Murphy 3.1, 3.2 up to first paragraph of 3.2.4,
inclusive. The rest of 3.2 is optional.
Key concepts (to focus on during reading):
– What learning is
– Hypothesis space
– Version space
– Strong sampling assumption
– Likelihood
– Prior
– Posterior
– Posterior predictive distribution
– How these combine to give a prediction probability
(a) For the numbers game, take as the hypothesis set :
in which
such that all hypotheses are limited to numbers between 1 and 100 (inclusive).
Suppose the training data is . What is the version space?
(b) Also for the numbers game, let the training data . Suppose the
hypothesis space , in which:

H
H = hodd ,heven ,h2 ,hP2 ,h5,hP5,h7 ,h { P7 }
hodd =all odd numbers
heven =all even numbers
h2 =all numbers ending in 2
hP2 =all powers of 2 (excluding 20
)
h5 =all numbers ending in 5
hP5 =all powers of 5 (excluding 50
)
h7 =all numbers ending in 7
hP7 =all powers of 7 (excluding 70
)
D = {5, 25}
D = {16}
H = hP2 ,h { P4 }
hP2 = {2,4,8,16,32,64}
Assume priors are , and use the strong sampling
assumption.
(i) Calculate the likelihood and the posterior for .
(ii) Calculate the likelihood and the posterior for .
(iii) Which posterior is larger?
4. Bayesian linear regression. Read Murphy 7.6.0, 7.6.1, 7.6.2.
To get an overview of the algebra from Eq. (7.54) to Eq. (7.55), show that
can be written in terms of and a prior term.
Label the posterior, likelihood, and prior terms. Do not assume Gaussian
densities in this problem.
hP4 = {4,16,64}
p h( 2 ) = 0.6, p h( 4 ) = 0.4
h2
h4
p w X, y,σ2 ( ) p y X,w,σ2 ( )

More products