$30
CS/ECE/ME 532
Homework 6
1. Data Fitting vs. Sparsity Tradeoff. For this problem, we’ll use the dataset BreastCancer.mat
(see the journal paper posted on Moodle for more details about the dataset). The goal is to
solve the Lasso problem
minimize
β∈Rn
kXβ − yk
2
2 + λkβk1
Here β is the weight vector applied to the expression levels of 8141 genes and y is the vector of
labels (+1 and −1) for each of the 295 patients. In this problem we will vary λ to explore the
tradeoff between data-fitting and sparsity, and we will compare the Lasso to Ridge Regression.
a) Write an implementation of the ISTA (iterative soft-thresholding via proximal gradient)
as covered in class that solves the above Lasso problem.
b) For each λ, use your code to find the optimal weights using only the first 100 patients
(first 100 rows). Plot a trade-off curve with the residual kXβ − yk2 on the vertical-axis
and kβk1 on the horizontal-axis and. The vertical-axis should contain the residual from
the training data. Explain what you see. Note: you’ll have to play around with how
you choose λ to see the whole curve!
c) With your solutions from part b), produce a new trade-off curve. This time, plot the
error rate on the vertical-axis versus the sparsity on the horizontal-axis. Here, the error
rate is the number of incorrect predictions divided by the total number of predictions.
The sparsity is the number of nonzero entries in β. For this purpose, we’ll say an entry
βi
is nonzero if |βi
| > 10−6
. Again, the vertical-axis should contain the error rate from
the training data.
d) Repeat parts b) and c). This time for the vertical-axis use the residual (respectively the
error rate) from the validation data. For validation, use the remaining rows of the data
matrix (the ones not used for training). Again, explain what you see in both trade-off
plots.
e) Compare the performance of the Lasso and Ridge Regression in this application. Do
this by the following steps. Randomly split the set of 295 patients into 10 subsets of size
29-30. Use these subsets for training, tuning, and testing. Use the data in 8 subsets to
find a solution βb to the Lasso optimization above and to the ridge regression
minimize
β∈Rn
kXβ − yk
2
2 + λkβk
2
2
.
Repeat this for a range of λ values, each yielding a solution βb
λ. Then compute the
prediction error using each βb
λ on one of the remaining two subsets. Select the βb
λ that
has the smallest prediction error (one for the Lasso and one for ridge regression). Finally,
compute the test error on the final subset of data by comparing both the squared error
and the count of how labels were incorrectly predicted, i.e., sign(x
T
i βb
λ) 6= yi
. To get a
better sense of the performances, you can repeat this entire process by taking different
1 of 2
subsets of 8, 1, and 1 for training, tuning, and testing, and compute the average squared
errors and average number of misclassifications.
2 of 2