$30
Data Mining 625.740
Homework for Module on Linear Discrimination
1. (a) Show that the distance from the hyperplane g(x) = wTx + w0 = 0 to the point x is
|g(x)|/||w|| by minimizing ||x − xq||2
subject to the constraint g(xq) = 0.
(b) Show that the projection of x onto the hyperplane is given by
xp = x −
g(x)
||w||2 w.
2. Let x1, . . . , xn be n q-dimensional samples and Q be any nonsingular positive definite q × q
matrix. Show that the vector x that minimizes
Xn
k=1
(xk − x)
TQ
−1
(xk − x)
is the sample mean, x¯ =
1
n
Pn
k=1 xk.
3. Consider a linear classifier with discriminant functions gi(x) = wT
i
x + wi0, i = 1, . . . , c.
Show that the decision regions are convex by showing that if x1 ∈ Ri and x2 ∈ Ri then
λx1 + (1 − λ)x2 ∈ Ri
if 0 ≤ λ ≤ 1.
4. In the gradient descent algorithm, ak+1 is obtained from ak by
ak+1 = ak − ρk∇ J(ak),
where ρk is a positive scale factor that sets the step size. Consider the criterion function
Jq(a) = X
y∈Y
(a
T y − b)
2
where Y(a) is the set of samples for which a
T y ≤ b. Suppose that y1 is the only sample in
Y(ak). Show that ∇ Jq(ak) = 2(a
T
k
y1 −b)y1 and that the matrix of second partial derivatives
is given by D = 2y1y
T
1
. Use this to show that when the optimal ρk is used in the gradient
descent algorithm,
ak+1 = ak +
b − a
T y1
||y1||2
y1.
5. Show that the partial derivatives of the functions yi = exp(ai)/
P
j
exp(aj ) used in multiple
class logistic discrimination are given by
∂yi
∂aj
= yi(δij − yj )
where δij =
?
1, i = j,
0, otherwise.