$30
Engineering Applications of Machine Learning and
Data Analytics
Homework #2
Instructions: There are four problems. X Partial credit is given for answers that are partially
correct. No credit is given for answers that are wrong or illegible. Write neatly.
You must submit two PDFs on D2L. The first PDF has the results to the analytical questions
as well as figures that are generated
Problem 1: Problem 2:
Problem 3: Problem 4:
Total:
1 Linear Classifier with a Margin [10pts]
Show that, regardless of the dimensionality of the feature vectors, a data set that has just two
data points, one from each class, is sufficient to determine the location of the maximum-margin
hyperplane. Hint #1: Consider a data set of two data points, x1 ∈ C1 (y1 = +1) and x2 ∈ C2
(y2 = −1) and set up the minimization problem (for computing the hyperplane) with appropriate
constraints on wTx1 + b and wTx2 + b and solve it. Hint #2: This can be formed as a constrained
optimization problem.
arg min
w∈Rp
kwk
2
2
Subject to: (some constraint)
What is w? b? Hint: What are the constraints? How did we solve the constrained optimization
problem in Fisher’s linear discriminate (see Linear Models Lecture Notes or constrained optimization
from Calculus)?
2 Linear Regression with Regularization [10pts]
In class we derived and discussed linear regression in detail. Find the result of minimize the loss
of sum of the squared errors; however, add in a penalty for an L2 penalty on the weights. More
formally,
arg min
w
(X
i
(wTxi − yi)
2 + λkwk
2
2
)
How does this change the solution to the original linear regression solution? What is the impact of
adding in this penalty?
Write your own implementation of logistic regression and implement your model on either realworld (see Github data sets: https://github.com/gditzler/UA-ECE-523-Sp2018/tree/master/
data), or synthetic data. If you simply use Scikit-learn’s implementation of the logistic regression
classifier, then you’ll receive zero points. A full 10/10 will be awarded to those that implement
logistic regression using the optimization of cross-entropy using stochastic gradient descent.
3 Density Estimation [20pts]
The ECE523 Lecture notes has a function for generating a checkerboard data set. Generate checkerboard data from two classes and use any density estimate technique we discussed to classify new
data using
pbY |X(y|x) =
pbX|Y (x|y)pbY (y)
pbX(x)
where pbY |X(y|x) is your estimate of the posterior given you estimates of pbX|Y (x|y) using a density
estimator and pbY (y) using a maximum likelihood estimator. You should plot pbX|Y (x|y) using a
pseudo color plot (see https://goo.gl/2SDJPL). Note that you must model pbX(x), pbY (y), and
pbX|Y (x|y). Note that pbX(x) can be calculated using the Law of Total Probability.
4 Conceptual [5pts]
The Bayes decision rule describes the approach we take to choosing a class ω for a data point x.
This can be achieved modeling P(ω|x) or P(x|ω)P(ω)/P(x). Compare and contrast these two
approaches to modeling and discuss the advantages and disadvantages. For the latter model, why
might knowing P(x) be useful?