Starting from:

$30

CS322 Assignment 2

CS322 Assignment 2
Written Questions:
1. 4.1
2. 4.2
3. 4.3
4. Congrats! You’re the mayor of Northfield! As mayor — you are interested in mitigating damage from
potential flooding of the Cannon river. It is of the utmost importance that you begin preparations for
flood control early (e.g., ordering sandbags, etc.) if you believe that there is going to be a flood this year.
To help you make this prediction, you do the same thing you do every year: reach-out to two teams of
hydrologists from Carleton (team A, and team B) to get their opinions. For simplicity, assume that the
Cannon flooding in a year is a binary outcome, and that these same teams have made predictions for the
past 20 years. From your records, you know that over the past 20 years, the Cannon has flooded 4 times.
This time, team A predicts that the Cannon will not flood, and points-out that they have achieved 80%
accuracy in predictions over the past 20 years. On the other hand, Team B predicts that the Cannon
will flood, but, according to your records, they have only achieved 60% prediction accuracy over the past
20 years. Which team’s prediction should you base your decision off of? If you can’t decide — what
additional information (if any) would you like to know from the teams? Ideally, your answer should
include discussions of accuracy, precision, recall, constant prediction, etc.
5. Criminal trials in the United States can be viewed as classifiers: juries attempt to make predictions based
on evidence about the guilt or innocence of a defendant. Even well-intentioned juries make mistakes; the
Innocence Project, for example, is a non-profit organization that “is committed to exonerating wrongly
convicted people through the use of DNA testing;”1
they have helped to overturn over 300 wrongful
convictions. Blackstone’s Ratio,2 a legal philosophy credited to Juror William Blackstone, states:
It is better that ten guilty persons escape than that one innocent suffer.
Discuss how Blackstone’s ratio relates to precision and recall. In your discussion, you should answer the
question of whether Blackstone’s Ratio advocates for high-precision juries or high-recall juries.
6. Logistic regression estimates binary probabilities, given a feature vector x. The probability that y = 1 is
modeled by computing the sigmoid of a dot product of w and x plus a bias, i.e.: P(y = 1|x) = σ(w
T x+b)
and P(y = 0|x) = 1 − σ(w
T x + b). For a true datapoint (xi
, yi), we can compute the likelihood of the
logistic regression parameters w, b given yi (the true label) and xi (the vector of features) as
Likelihood(w, b|xi
, yi) = P(y = yi
|W, b, xi)
= σ(w
T xi + b)
yi
· (1 − σ(w
T xi + b))1−yi
Note that we have used yi
, the true label, as a “switch” variable. So — if yi
is 1, the likelihood of the
parameters is estimated by σ(w
T xi + b). If the true label is 1, then the higher w
T xi + b is, the more
likely W and b are. If yi
is 0, the likelihood of the parameters is estimated by 1 − σ(w
T xi + b), i.e., for
datapoints with true label 0, the smaller w
T xi + b is, the more likely the parameters w, b are.
Binary crossentropy is a loss function that attempts to maximize the log-likelihood of all of the datapoints
(because the datapoints are assumed to be independent draws, we can compute their joint probability as
a product, i.e. P((x1, y1),(x2, y2),(x3, y3). . . |w, b) is assumed to be equal to P(x1, y1|w, b)·P(x2, y2|w, b)·
P(x3, y3|w, b). . .. Maximizing log likelihood is equivalent to minimizing negative log likelihood. Letting
zi = w
T xi + b, the loss function (which we will minimize — lower loss is good!) for logistic regression
1https://www.innocenceproject.org/
2https://en.wikipedia.org/wiki/Blackstone%27s ratio
CS322, Spring 2019 Assignment 2
with binary crossentropy loss is:
− log(Likelihood(w, b|{xi
, yi})) = − log(Y
i
P(y = yi
|w, b, xi))
= −
X
i
log(P(y = yi
|w, b, xi))
= −
X
i
log(σ(zi)
yi
· (1 − σ(zi + b))1−yi
)
= −
X
i

yi
· log σ(zi) + (1 − yi) log(σ(−zi))
L({xi
, yi}, w, b) = X
i

− yi
· log σ(zi) − (1 − yi) log(σ(−zi))
Let the loss for datapoint i be defined as the term in the sum corresponding to datapoint i, i.e., Li =
− yi
· log σ(zi) − (1 − yi) log(σ(−zi))
where zi = w
T xi + b.
(a) When yi = 1, what is the algebraic expression for the loss Li
incurred by a given setting of w, b (in
terms of w, b, xi)?
(b) Say w, b were set such that w
T xi + b was 3 for this example. What is P(yi = 1|x)? What is the raw
value of the loss? Is this a good setting of the parameters for this example, or a bad setting?
(c) Say instead that w, b were set such that w
T xi + b was -5 for this example. What is P(yi = 1|x)?
What is the raw value of the loss? Is this a good setting of the parameters for this example, or a bad
setting?
(d) Say instead that w
T xi + b was still -5, but yi was actually 0. What is P(yi = 0|xi)? What is the
algebraic expression for the loss incurred on this example for a given setting of w, b? What is the raw
value of the loss?
(e) To minimize the negative log likelihood with respect to the vector w and the bias value b for a
datapoint (xi
, yi), we may use a gradient-based method like gradient descent. Thus — we need to
compute the gradient of the loss function with respect to each parameter. We will drop the subscript
i in this question for notational ease...
i. What is dL
dz =
d
dz
− y · log σ(z) − (1 − y) log(σ(−z))
? Potentially helpful facts: d
dz σ(z) =
σ(z) · (1 − σ(z)) and d
dz log(z) = 1
z
ii. What is dz
db =
d
db (w
T x + b)?
iii. What is dz
dwi
=
d
dwi
w
T x + b =
d
dwi
P
j wjxj + b?
iv. Wrapping up — what is the gradient of the loss L with respect to the parameter b? (Note that
the chain rule states: dL
db =
dL
dz
dz
db ).
v. Wrapping up — what is the gradient of the loss L with respect to the parameter wi? (Note that
the chain rule states: dL
dwi
=
dL
dz
dz
dwi
)
Now that you’ve derived the gradient of the loss with respect to each parameter, you could write your
own gradient descent loop, i.e., with learning rate α, wi
:= wi−α∗
dL
dwi
for each wi
, and b := b−α∗
dL
db .
You aren’t required to do so, but you did do the hardest parts!
7. Please complete intro.py which contains some numpy introduction exercises. Insert your answers in the
code (see the instructions at the top of the file).

More products