Starting from:

$30

CSCE 633 Homework IV

CSCE 633

Homework IV

Problem 1: Convex Functions (40 points)
Prove the following functions are convex.
• (10’) the log-sum-exponential function: f(x) = log Pn
i=1 exp(xi) is a convex function in terms
of x = (x1, . . . , xn)
⊤. (Hint: use the second order condition of convexity).
• (15’) the objective of logistic regression for binary classification:
f(w) = 1
n
Xn
i=1
log(1 + exp(−yiw⊤xi)) + λ
2
∥w∥
2
2
,
where xi ∈ R
d
, yi ∈ {1, −1}. (Hint: first prove the convexity of the function h(s) = log(1 +
exp(s)) and then use the rules that preserve convexity.)
• (15’) the objective of support vector machine: f(w) = 1
n
Pn
i=1 max(0, 1 − yiw⊤xi) + λ
2
∥w∥
2
2
,
where xi ∈ R
d
, yi ∈ {1, −1}. (Hint: similar as above).
Problem 2: the constrained version of Ridge Regression (30 points)
In class, we have mentioned the constrained version of ridge regression:
min
w∈Rd
∥Φw − y∥
2
2
s.t. ∥w∥2 ≤ s
where Φ ∈ R
n×d and y ∈ R
n
. Answer the following questions:
• (20’) Does strong duality hold? If yes, derive the KKT conditions regarding the optimal
solution w∗ to the above problem.
• (optional 10’) Does a close-formed solution exist? If yes, derive the closed-form solution. If
not, can you propose an algorithm for computing the optimal solution (describe the key steps
of your algorithm)?
1
Problem 3: The equivalence between the Maximum Entropy Model
and the Logistic Regression (40 points)
In class, we have talked about the maximum entropy model. For learning the posterior probabilities
Pr(y|x) = p(y|x) for y = 1, . . . , K given a set of training examples (xi
, yi), i = 1, . . . , n, we can
maximize the entropy of the posterior probabilities subject to a set of constraints, i.e.,
max
p(y|xi)

Xn
i=1
X
K
y=1
p(y|xi) ln p(y|xi)
s.t. X
K
y=1
p(y|xi) = 1
Xn
i=1
δ(y, yi)
n
fj (xi) = Xn
i=1
p(y|xi)
n
fj (xi), j = 1, . . . , d, y = 1, . . . , K
where δ(y, yi) is equal to 1 if yi = y, and 0 otherwise, and fj (xi) is a feature function. Let us
consider fj (xi) = [xi
]j , i.e., the j-th coordinate of xi
. Please show that the above Maximum
Entropy Model is equivalent to the multi-class logistic regression model (without regularization).
(Hint: use the Lagrangian dual theory)
2

More products