Assignment I
Linear Regression
Generating Synthetic Data
This assignment shows how we can extend ordinary least squares regression, which uses the hypothesis class
of linear regression functions, to non-linear regression functions modeled using polynomial basis functions and
radial basis functions. The function we want to fit is . This
is a univariate function as it has only one input variable. First, we generate synthetic input (data) by
sampling points from a uniform distribution on the interval .
ytrue = ftrue(x) = 6(sin(x + 2) + sin(2x + 4))
xi
n = 750 [−7.5, 7.5]
In [ ]: # The true function
def f_true(x):
y = 6.0 * (np.sin(x + 2) + np.sin(2*x + 4))
return y
We can generate a synthetic data set, with Gaussian noise.
In [ ]: import numpy as np # For all our math needs
n = 750 # Number of data points
X = np.random.uniform(-7.5, 7.5, n) # Training examples, in one dimension
e = np.random.normal(0.0, 5.0, n) # Random Gaussian noise
y = f_true(X) + e # True labels with noise
Now, we plot the raw data as well as the true function (without noise).
In [ ]: import matplotlib.pyplot as plt # For all our plotting needs
plt.figure()
# Plot the data
plt.scatter(X, y, 12, marker='o')
# Plot the true function, which is really "unknown"
x_true = np.arange(-7.5, 7.5, 0.05)
y_true = f_true(x_true)
plt.plot(x_true, y_true, marker='None', color='r')
Recall that we want to build a model to generalize well on future data, and in order to generalize well on future
data, we need to pick a model that trade-off well between fit and complexity (that is, bias and variance). We
randomly split the overall data set ( ) into three subsets:
Training set: consists of the actual training examples that will be used to train the model;
Validation set: consists of validation examples that will be used to tune model hyperparameters
(such as in ridge regression) in order to find the best trade-off between fit and complexity (that is, the
value of that produces the best model);
Test set: consists of test examples to estimate how the model will perform on future data.
For this example, let us randomly partition the data into three non-intersecting sets: of ,
of and of .
D
Dtrn
Dval
λ 0
λ
Dtst
Dtrn = 60% D
Dval = 10% D Dtst = 30% D
In [ ]: # scikit-learn has many tools and utilities for model selection
from sklearn.model_selection import train_test_split
tst_frac = 0.3 # Fraction of examples to sample for the test set
val_frac = 0.1 # Fraction of examples to sample for the validation set
# First, we use train_test_split to partition (X, y) into training and test se
ts
X_trn, X_tst, y_trn, y_tst = train_test_split(X, y, test_size=tst_frac, random
_state=42)
# Next, we use train_test_split to further partition (X_trn, y_trn) into train
ing and validation sets
X_trn, X_val, y_trn, y_val = train_test_split(X_trn, y_trn, test_size=val_frac
, random_state=42)
# Plot the three subsets
plt.figure()
plt.scatter(X_trn, y_trn, 12, marker='o', color='orange')
plt.scatter(X_val, y_val, 12, marker='o', color='green')
plt.scatter(X_tst, y_tst, 12, marker='o', color='blue')
1. **Regression with Polynomial Basis Functions**
, 30
points.
This problem extends ordinary least squares regression, which uses the hypothesis class of linear regression
functions, to non-linear regression functions modeled using polynomial basis functions. In order to learn
nonlinear models using linear regression, we have to explicitly transform the data into a higher-dimensional
space. The nonlinear hypothesis class we will consider is the set of -degree polynomials of the form
or a linear combination of polynomial basis function:
.
The monomials are called basis functions, and each basis function has a
corresponding weight associated with it, for all . We transform each univariate data point
into into a multivariate ( -dimensional) data point via . When this transformation
is applied to every data point, it produces the Vandermonde matrix:
.
d
f(x) = w0 + w1x + w2x
2+. . . +wdx
d
f(x) = [w , , . . . , 0 w1 w2 wd ]
T
⎡
⎣
⎢
⎢
⎢⎢
⎢
⎢
⎢
1
x
x
2
⋮
x
d
⎤
⎦
⎥
⎥
⎥⎥
⎥
⎥
⎥
{1, x, x , . . . , }
2 x
d x
k
wk k = 1, . . . , d xi
d ϕ(x ) → [1, , , . . . , ] i xi x
2
i x
d
i
Φ =
⎡
⎣
⎢
⎢
⎢
⎢
⎢
1
1
⋮
1
x1
x2
⋮
xn
x
2
1
x
2
2
⋮
x
2
n
. . .
. . .
⋱
⋯
x
d
1
x
d
2
⋮
x
d
n
⎤
⎦
⎥
⎥
⎥
⎥
⎥
a. (10 points)
Complete the Python function below that takes univariate data as input and computes a Vandermonde matrix of
dimension . This transforms one-dimensional data into -dimensional data in terms of the polynomial basis and
allows us to model regression using a -degree polynomial.
d d
d
In [ ]: # X float(n, ): univariate data
# d int: degree of polynomial
def polynomial_transform(X, d):
#
#
# *** Insert your code here ***
#
#
b. (10 points)
Complete the Python function below that takes a Vandermonde matrix and the labels as input and learns
weights via ordinary least squares regression. Specifically, given a Vandermonde matrix , implement the
computation of . Remember that in Python, @ performs matrix multiplication, while *
performs element-wise multiplication. Alternately, numpy.dot (https://docs.scipy.org/doc/numpy1.15.0/reference/generated/numpy.dot.html) also performs matrix multiplication.
Φ y
Φ
w = (Φ Φ y
T
)−1ΦT
In [ ]: # Phi float(n, d): transformed data
# y float(n, ): labels
def train_model(Phi, y):
#
#
# *** Insert your code here ***
#
#
c. (5 points)
Complete the Python function below that takes a Vandermonde matrix , corresponding labels , and a linear
regression model as input and evaluates the model using mean squared error. That is,
.
Φ y
w
ϵMSE = ( −
1
n ∑n
i=1 yi wT Φi)
2
In [ ]: # Phi float(n, d): transformed data
# y float(n, ): labels
# w float(d, ): linear regression model
def evaluate_model(Phi, y, w):
#
#
# *** Insert your code here ***
#
#
d. (5 points, Discussion)
We can explore the effect of complexity by varying to steadily increase the non-linearity of
the models. For each model, we train using the transformed training data ( , whose dimension increases) and
evaluate its performance on the transformed validation data and estimate what our future accuracy will be using
the test data.
From plot of vs. validation error below, which choice of do you expect will generalize best?
d = 3, 6, 9, ⋯, 24
Φ
d d
In [ ]: w = {} # Dictionary to store all the trained models
validationErr = {} # Validation error of the models
testErr = {} # Test error of all the models
for d in range(3, 25, 3): # Iterate over polynomial degree
Phi_trn = polynomial_transform(X_trn, d) # Transform train
ing data into d dimensions
w[d] = train_model(Phi_trn, y_trn) # Learn model on
training data
Phi_val = polynomial_transform(X_val, d) # Transform valid
ation data into d dimensions
validationErr[d] = evaluate_model(Phi_val, y_val, w[d]) # Evaluate model
on validation data
Phi_tst = polynomial_transform(X_tst, d) # Transform test data i
nto d dimensions
testErr[d] = evaluate_model(Phi_tst, y_tst, w[d]) # Evaluate model on tes
t data
# Plot all the models
plt.figure()
plt.plot(validationErr.keys(), validationErr.values(), marker='o', linewidth=3
, markersize=12)
plt.plot(testErr.keys(), testErr.values(), marker='s', linewidth=3, markersize
=12)
plt.xlabel('Polynomial degree', fontsize=16)
plt.ylabel('Validation/Test error', fontsize=16)
plt.xticks(list(validationErr.keys()), fontsize=12)
plt.legend(['Validation Error', 'Test Error'], fontsize=16)
plt.axis([2, 25, 15, 60])
Finally, let's visualize each learned model.
In [ ]: plt.figure()
plt.plot(x_true, y_true, marker='None', linewidth=5, color='k')
for d in range(9, 25, 3):
X_d = polynomial_transform(x_true, d)
y_d = X_d @ w[d]
plt.plot(x_true, y_d, marker='None', linewidth=2)
plt.legend(['true'] + list(range(9, 25, 3)))
plt.axis([-8, 8, -15, 15])
2. **Regression with Radial Basis Functions**
, 70
points
In the previous case, we considered a nonlinear extension to linear regression using a linear combination of
polynomial basis functions, where each basis function was introduced as a feature . Now, we
consider Gaussian radial basis functions of the form:
,
whose shape is defined by its center and its width . In the case of polynomial basis regression, the
user's choice of the dimension determined the transformation and the model. For radial basis regression, we
have to contend with deciding how many radial basis functions we should have, and what their center and width
parameters should be. For simplicity, let's assume that is fixed. Instead of trying to identify the number
of radial basis functions or their centers, we can treat **each data point as the center of a radial basis function**,
which means that the model will be:
.
This transformation uses radial basis functions centered around data points and each basis function
has a corresponding weight associated with it, for all . We transform each univariate data point
into into a multivariate ( -dimensional) data point via . When this
transformation is applied to every data point, it produces the radial-basis kernel:
.
ϕ(x) = x
k
ϕ(x) = e−γ (x−μ)
2
μ γ 0
d
γ = 0.1
f(x) = [w , , . . . , 1 w2 w3 wn]
T
⎡
⎣
⎢⎢
⎢⎢
⎢⎢
⎢⎢
e−γ (x−x1)
2
e−γ (x−x2)
2
e−γ (x−x2)
2
. . .
e−γ (x−xn)
2
⎤
⎦
⎥⎥
⎥⎥
⎥⎥
⎥⎥
e−γ (x−xi)
2
wi i = 1, . . . , n
xj n ϕ(x ) → [. . . , , . . . ] j e−γ (xj−xi)
2
Φ =
⎡
⎣
⎢
⎢⎢
⎢
⎢⎢
1
e−γ (x2−x1)
2
⋮
e−γ (xn−x1)
2
e−γ (x1−x2)
2
1
⋮
e−γ (xn−x2)
2
e−γ (x1−x3)
2
e−γ (x2−x3)
2
⋮
e−γ (xn−x3)
2
. . .
. . .
⋱
⋯
e−γ (x1−xn)
2
e−γ (x2−xn)
2
⋮
1
⎤
⎦
⎥
⎥⎥
⎥
⎥⎥
a. (15 points)
Complete the Python function below that takes univariate data as input and computes a radial-basis kernel. This
transforms one-dimensional data into -dimensional data in terms of Gaussian radial-basis functions centered at
each data point and allows us to model nonlinear (kernel) regression.
n
In [ ]: # X float(n, ): univariate data
# B float(n, ): basis functions
# gamma float : standard deviation / scaling of radial basis kernel
def radial_basis_transform(X, B, gamma=0.1):
#
#
# *** Insert your code here ***
#
#
b. (15 points)
Complete the Python function below that takes a radial-basis kernel matrix , the labels , and a regularization
parameter as input and learns weights via ridge regression. Specifically, given a radial-basis kernel
matrix , implement the computation of .
Φ y
λ 0
Φ w = (ΦT Φ + λIn ) y
−1 ΦT
In [ ]: # Phi float(n, d): transformed data
# y float(n, ): labels
# lam float : regularization parameter
def train_ridge_model(Phi, y, lam):
#
#
# *** Insert your code here ***
#
#
c. (30 points)
As before, we can explore the tradeoff between fit and complexity by varying
. For each model, train using the transformed training data ( ) and evaluate
its performance on the transformed validation and test data. Plot two curves: (i) vs. validation error and (ii)
vs. test error, as above.
What are some ideal values of ?
λ ∈ [10 , ⋯, 1, ⋯ ]
−3 10
−2 10
3 Φ
λ λ
λ
In [ ]: #
# *** Insert your code here ***
#
d. (10 points, Discussion)
Plot the learned models as well as the true model similar to the polynomial basis case above. How does the
linearity of the model change with λ?
In [ ]: #
# *** Insert your code here ***
#
You have to submit a single .py file that contains all the code.