$30
Engineering Applications of Machine Learning and
Data Analytics
Homework #1
Instructions: There are seven problems. Partial credit is given for answers that are partially
correct. No credit is given for answers that are wrong or illegible. Write neatly. You must submit
two PDFs on D2L. The first PDF has the results to the analytical questions as well as figures that
are generated
Problem 1:
Problem 2:
Problem 3:
Total:
arizona.edu 1
1 Probability and Discriminant Classifiers [20pts]
PART I: Maximum Posterior vs Probability of Chance Show/explain that P(ωmax|x) ≥
1
c
when we are using the Bayes decision rule, where c is the number of classes. Derive an expression
for p(err). Let ωmax be the state of nature for which P(ωmax|x) ≥ P(ωi
|x) for i =, 1 . . . , c. Show
that p(err) ≤ (c − 1)/c when we use the Bayes rule to make a decision. Hint, use the results from
the previous questions.
PART II: Bayes Decision Rule Classifier Let the elements of a vector x = [x1, . . . , xd]
T be
binary valued. Let P(ωj ) be the prior probability of the class ωj (j ∈ [c]), and let
pij = P(xi = 1|ωj )
with all elements in x being independent. If P(ω1) = P(ω2) = 1
2
, and pi1 = p > 1
2
and pi2 = 1 − p,
show that the minimum error decision rule is
Choose ω1 if X
d
i=1
xi >
d
2
Hint: Think back to ECE503 and types of random variables then start out with
Choose ω1 if P(ω1)P(x|ω1) > P(ω2)P(x|ω2)
PART III: The Ditzler Household Growing Up My parents have two kids now grown into
adults. Obviously there is me, Greg. I was born on a Wednesday. What is the probability that I
have a brother? You can assume that P(boy) = P(girl) = 1
2
.
PART IV: Bayes classifier Let consider a Bayes classifier with p(x|ωi) distributed as a multivariate Gaussian with mean µi and covariance Σi = σ
2
I (note they all share the same covariance).
We choose the class that has the largest
gi(x) = log(p(x|ωi)P(ωi)) ∝ wT
i x + w0i
Find wi and w0i
. Fact:
p(x|ωi) = 1
(2π)
d
2 |Σi
|
1
2
exp
−
1
2
(x − µi)
T Σ
−1
i
(x − µi)
arizona.edu 2
2 Linear and Quadratic Classifiers – Code [20pts]
• Write a general function to generate random samples from N (µ, Σ) in d-dimensions (i.e.,
µ ∈ R
d and Σ ∈ R
d×d
).
• Write a procedure of the discriminant of the following form
gi(x) = −
1
2
(x − µi)
T Σ
−1
i
(x − µi) −
d
2
log(2π) −
1
2
log(|Σi
|) + log(P(ωi)) (1)
• Generate a 2D dataset with three classes and use the quadratic classifier above to learn the
parameters and make predictions. As an example, you should generate training data shown
below to estimate the the parameters of the classifier in (1) and you should test the classifier
on another randomly generated dataset. It is also sufficient to show the dataset used to train
your classifier and the decision boundary it produces.
−4 −2 0 2 4
−4
−2
0
2
4
x1
x2
ω1
ω2
ω3
• Write a procedure for computing the Mahalanobis distance between a point x and some mean
vector µ, given a covariance matrix Σ.
• Implement the naïve Bayes classifier from scratch and then compare your results to that of
Python’s built-in implementation. Use different means, covariance matrices, prior probabilities
(indicated by relative data size for each class) to demonstrate that your implementations are
correct.
arizona.edu 3
3 Misc Code [20pts] - Choose One
Problem I: Comparing Classifiers A text file, hw1-scores.txt, containing classifier errors
measurements has been uploaded to D2L. Each of the columns represents a classifier and each row
a data set that was evaluated. Are all of the classifiers performing equally? Is there one or more
classifiers that is performing better than the others? Your response should be backed by statistics.
Suggested reading:
• Janez Demšar, “Statistical Comparisons of Classifiers over Multiple Data Sets,” Journal of
Machine Learning Research, vol. 7, 1–20.
Read the abstract to get an idea about the theme of comparisons. Sections 3.1.3 and 3.2.2 can
be used to answer the question posed here.
Problem II: Sampling from a Distribution Let the set N ∈ [1, . . . , n, . . . , N] be a set of
integers and p be a probability distribution p = [p1, . . . , pn, . . . , pN ] such that pk is the probability
of observing k ∈ N . Note that since p is a distribution then 1
Tp = 1 and 0 ≤ pk ≤ 1 ∀n. Write a
function sample(M, p) that returns M indices sampled from the distribution p. Provide evidence
that your function is working as desired. Note that all sampling is assumed to be i.i.d. You must
include a couple of paragraphs and documented code that discusses how you were able to accomplish
this task.
arizona.edu 4