$30
ECE M148 Homework 4
Introduction to Data Science
You may type your homework or scan your handwritten version. Make sure all
the work is discernible.
1. Suppose we have the following confusion matrix outputted from a logistic regression
using the probability threshold P(Y = P ositive) ≥ t , i.e. we classify the sample as
Positive if P(Y = P ositive) is greater than t otherwise we classify as Negative.
(a) Compute the false positive and false negative rates.
(b) How would you expect the confusion matrix to change if we increased t?
1
2. Bayes Theorem. Consider that you own a small restaurant. You have a smoke detector
in your kitchen. The chances that a hazardous fire occurs in the kitchen is pretty rare,
say 1%. The smoke alarm is pretty accurate in detecting such fire and it sounds the
alarm 99% of the time. However, the alarm is poorly calibrated and it also sounds
an alarm sometimes when there is no fire, due to smoke detected from cooking. The
accuracy of the smoke alarm under non-fire condition is 90%.
(a) What is the probability that the smoke detector sounds an alarm?
(b) Given that you heard the alarm sound, what is the probability that there was
actually a fire?
(c) Comment on how useful the smoke detector is and would you consider replacing
it?
2
3. Logistic regression is minimizing the following cross-entropy loss function:
L(β) = −
Xn
i=1
yi
log( 1
1 + e
−(βT xi)
) + (1 − yi) log(1 −
1
1 + e
−(βT xi)
)
where β is a vector of parameters, n is the number of samples,xi
is a k dimensional
data sample, and yi ∈ {0, 1} is a binary variable that represents the class of sample i.
Logistic regression is generally solved using iterative methods. One such method is the
gradient descent method where we start with random values for {β
1
j
: 1 ≤ j ≤ k} and
we update them using the gradient rule
β
t+1
j = β
t
j − η
dL(β)
dβt
j
for all j such that 1 ≤ j ≤ k where η is the step-size.
Prove that
dL(β)
dβj
=
Xn
i=1
(
1
1 + e
−(βT xi)
− yi)x
j
i
where x
j
i
is the jth element of the ith sample.
3
4. In your own words, explain the following types of multi-class classification methods:
(a) One vs All
(b) All vs All
Provide the advantages and disadvantages of each method.
4
5. True or False questions. For each statement, decide whether the statement is True or
False and provide justification (full credit for the correct justification).
(a) For a classification model, positive predictive value is the probability that a model
classifies a sample as positive given that the true label of the sample is positive.
(b) Assume we are working with a multinomial logistic regression such that P(Y =
i|X) = e
β0,i+β1,iXP(Y = K|X) for 1 ≤ i ≤ K − 1. For a dataset with 1 feature
and 4 possible class labels, the number of learnable parameters βj,i is 8.
(c) If the log-odds function is modeled as a quadratic, logistic regression can provide
a non-linear decision boundary.
(d) You are building a classifier to detect fraudulent credit card transactions. Your
employer states that a 90% success in detection of fraudulent transactions is good
enough. You test your model on the next 1000 transactions and get a 97% test
accuracy. Therefore, your model is doing much better than what is required.
(e) For a very good classification model, we expect the confusion table to be dominated by diagonal entries.
5