Starting from:

$30

CS559-B HW1

CS559-B HW1

Problem 1 (5pt): Provide an intuitive example to show that P(A|B) and P(B|A) are in general
not the same.
Problem 2 (10pt): Independence and un-correlation
(1) (5pt) Suppose X and Y are two continuous random variables, show that if X and Y are
independent, then they are uncorrelated.
(2) (5pt) Suppose X and Y are uncorrelated, can we conclude X and Y are independent? If so,
prove it, otherwise, give one counterexample. (Hint: consider X ∼ Uniform[−1, 1] and Y = X2
)
Problem 3 (15pt): [Minimum Probability of Error, Discriminant Function] Let the components
of the vector x = [x1, ..., xd]
T be binary valued (0 or 1), and let P(ωj ) be the prior probability for
the state of nature ωj and j = 1, ..., c. We define
pij = P(xi = 1|ωj ), i = 1, ..., d, j = 1, ..., c
with the components xi being statistical independent for all x in ωj . Show that the minimum
probability of error is achieved by the following decision rule:
Decide ωk if gk(x) ≥ gj (x) for all j and k, where
gj (x) = X
d
i=1
xi
ln pij
1 − pij
+
X
d
i=1
ln(1 − pij ) + ln P(ωj )
Problem 4 (10pt): [Likelihood Ratio] Suppose we consider two category classification, the class
conditionals are assumed to be Gaussian, i.e., p(x|ω1) = N(4, 1) and p(x|ω2) = N(8, 1), based
on prior knowledge, we have P(ω2) = 1
4
. We do not penalize for correct classification, while for
misclassification, we put 1 unit penalty for misclassifying ω1 to ω2 and put 3 unit for misclassifying
ω2 to ω1. Derive the bayesian decision rule using likelihood ratio.
Problem 5 (15pt): [Minimum Risk, Reject Option] In many machine learning applications, one
has the option either to assign the pattern to one of c classes, or to reject it as being unrecognizable.
If the cost for reject is not too high, rejection may be a desirable action. Let
λ(αi
|ωj ) =



0, i = j and i, j = 1, . . . , c
λr, i = c + 1
λs, otherwise
where λr is the loss incurred for choosing the (c+ 1)-th action, rejection, and λs is the loss incurred
for making any substitution error.
1
(1) (5pt) Derive the decision rule with minimum risk.
(2) (5pt) What happens if λr = 0?
(3) (5pt) What happens if λr > λs?
Problem 6 (25pt): [Maximum Likelihood Estimation (MLE)] A general representation of a
exponential family is given by the following probability density:
p(x|η) = h(x) exp{η
T T(x) − A(η)}
• η is natural parameter.
• h(x) is the base density which ensures x is in right space.
• T(x) is the sufficient statistics.
• A(η) is the log normalizer which is determined by T(x) and h(x).
• exp(.) represents the exponential function.
(1) (5pt) Write down the expression of A(η) in terms of T(x) and h(x).
(2) (10pt) Show that ∂
∂ηA(η) = EηT(x) where Eη(.) is the expectation w.r.t p(x|η).
(3) (10pt) Suppose we have n i.i.d samples x1, x2, . . . , xn, derive the maximum likelihood estimator for η. (You may use the results from part(b) to obtain your final answer)
Problem 7 (20pt): [Logistic Regression, MLE] In this problem, you need to use MLE to derive
and build a logistic regression classifier (suppose the target/response y ∈ {0, 1}):
(1) (5pt) Suppose the classifier is y = x
T
θ, where θ contains the weight as well as bias parameters. The log-likelihood function is LL(θ), what is ∂LL(θ)
∂θ ?
(2) (15pt) Write the codes to build and train the classifier on Iris plant dataset (https://
archive.ics.uci.edu/ml/datasets/iris). The iris dataset contains 150 samples with 4 features
for 3 classes. To simplify the problem, we only consider: (a) two classes, i.e., virginica and nonvirginica; (b) The first 2 types of features for training, i.e., sepal length and sepal width. Based
on these simplified settings, train the model using gradient descent. Please show the classification
results. (Note that (1) you could split the iris dataset into train/test set. (2) You could visualize
the results by showing the trained classifier overlaid on the train/test data. (3) You could tune
several hyperparameters, e.g., learning rate, weight initialization method etc, to see their effects.
(3) You could use sklearn or other packages to load and process the data, but you can not use the
package to train the model).
2

More products