Starting from:

$29.99

Homework 1: Linear Regression

Homework 1: Linear Regression
Introduction
This homework is on different forms of linear regression and focuses on loss functions, optimizers, and regularization. Linear regression will be one of the few models that we see that has an analytical solution. These problems focus on deriving these solutions and exploring their properties. If you find that you are having trouble with the first couple problems, we recommend going over the fundamentals of linear algebra and matrix calculus. We also encourage you to first read the Bishop textbook, particularly: Section 2.3 (Properties of Gaussian Distributions), Section 3.1 (Linear Basis Regression), and Section 3.3 (Bayesian Linear Regression). (Note that our notation is slightly different but the underlying mathematics remains the same :). Please type your solutions after the corresponding problems using this L ATEX template, and start each problem on a new page.
Problem 1 (Centering and Ridge Regression, 7pts) Consider a data set D = {(xi,yi)}n i=1 in which each input vector x ∈ Rm. As we saw in lecture, thisdata set can be written using the design matrix X ∈Rn×m and the target vector y ∈Rn. For this problem assume that the input matrix is centered, that is the data has been pre-processed such that 1 nPn i=1 xij = 0. Additionally we will use a positive regularization constant λ 0 to add a ridge regression term. In particular we consider a ridge regression loss function of the following form, L(w,w0) = (y−Xw−w01)(y−Xw−w01) + λww. Note that we are not incorporating the bias w0 ∈ R into the weight parameter w ∈ Rm. For this problem the notation 1 indicates a vector of all 1’s, in this case in implied to be in Rn. (a) Compute the gradient of L(w,w0) with respect to w0. Simplify as much as you can for full credit. (b) Compute the gradient of L(w,w0) with respect to w. Simplify as much as you can for full credit. Make sure to give your answer in vector form. (c) Suppose that λ 0. Knowing that L is a convex function of its arguments, conclude that a global optimizer of L(w,w0) is
w0 =
1 n
n X i=1
yi (1)
w = (XX + λI)−1Xy (2)
(d) In order to take the inverse in the previous question, the matrix (XX + λI) must be invertible. One way to ensure invertibility is by showing that a matrix is positive definite, i.e. it has all positive eigenvalues. Given that XX is positive semi-definite, i.e. all non-negative eigenvalues, prove that the full matrix is invertible.
(e) What difference does the last problem highlight between standard least-squares regression versus ridge regression?
Solution
Problem 2 (Priors and Regularization,7pts) In this problem we consider a model of Bayesian linear regression. Define the prior on the parameters as, p(w) = N(w|0,α−1I), where α is as scalar precision hyperparameter that controls the variance of the Gaussian prior. Define the likelihood as,
p(y|X,w) =
n Y i=1
N(yi|wTxi,β−1),
where β is another fixed scalar defining the variance.
Using the fact that the posterior is the product of the prior and the likelihood (up to a normalization constant), i.e., argmax w lnp(w|y,X) = argmax w lnp(w) + lnp(y|X,w). Show that maximizing the log posterior is equivalent to minimizing a regularized loss function given by L(w) + λR(w), where
L(w) =
1 2
n X i=1 (yi −wTxi)2
R(w) =
1 2
wTw
Do this by writing lnp(w|y,X) as a function of L(w) and R(w), dropping constant terms if necessary. Conclude that maximizing this posterior is equivalent to minimizing the regularized error term given by L(w) + λR(w) for a λ expressed in terms of the problem’s constants.
Solution
3. Modeling Changes in Congress [10pts]
The objective of this problem is to learn about linear regression with basis functions by modeling the average age of the US Congress. The file congress-ages.csv contains the data you will use for this problem. It has two columns. The first one is an integer that indicates the Congress number. Currently, the 114th Congress is in session. The second is the average age of that members of that Congress. The data file looks like this:
congress,average_age 80,52.4959 81,52.6415 82,53.2328 83,53.1657 84,53.4142 85,54.1689 86,53.1581 87,53.5886
and you can see a plot of the data in Figure 1.
Figure 1: Average age of Congress. The horizontal axis is the Congress number, and the vertical axis is the average age of the congressmen.
Problem 3 (Modeling Changes in Congress, 10pts) Implement basis function regression with ordinary least squares with the above data. Some sample Python code is provided in linreg.py, which implements linear regression. Plot the data and regression lines for the simple linear case, and for each of the following sets of basis functions:
(a) φj(x) = xj for j = 1,...,6 (b) φj(x) = xj for j = 1,...,4
(c) φj(x) = sin(x/j) for j = 1,...,6
(d) φj(x) = sin(x/j) for j = 1,...,10
(e) φj(x) = sin(x/j) for j = 1,...,22
In addition to the plots, provide one or two sentences for each with numerical support, explaining whether you think it is fitting well, overfitting or underfitting. If it does not fit well, provide a sentence explaining why. A good fit should capture the most important trends in the data.
Solution
Problem 4 (Calibration, 1pt) Approximately how long did this homework take you to complete?
Answer:

More products