Starting from:

$30

625.740 Homework for Module 9


Data Mining: 625.740
Homework for Module 9
1. Fisher’s linear discriminant is

∗ = arg max

J(wˆ ) = arg max


TSbwˆ

TSwwˆ
,
where Sb = (m1 − m2)(m1 − m2)
T and Sw =
P
j
P
α
(xα − mj )(xα − mj )
T
.
(a) By writing ∂J
∂wˆ = 0, show that
S
−1
w Sbwˆ = J(wˆ )wˆ .
(b) Explain why wˆ

is the eigenvector for which J(wˆ ) is the maximum eigenvalue of S
−1
w Sb.
(c) Explain why Sbwˆ is always in the direction of m1 − m2 and thus show that

∗ = const. · S
−1
w (m1 − m2).
2. Another way to optimize Fisher’s linear discriminant (suggested by Barry Fridling):
(a) Show that for any two real vectors x and y
(x
T y)
2 ≤ (x
T x)(y
T y), (Cauchy-Schwarz).
(b) Show that if the λk are positive,
 X
N
k=1
xkyk
!2

 X
N
k=1
λkx
2
k
! X
N
k=1
y
2
k
/λk
!
.
(c) Thus, show that for A positive definite
(x
T y)
2 ≤ (x
T Ax)(y
T A−1y).
(d) By letting A = Sw in the expression above, and writing
J(wˆ ) = |wˆ
T
(m1 − m2)|
2

TSwwˆ
,
show again that

∗ = const. · S
−1
w (m1 − m2).
3. Using the Optdigits dataset from the UCI repository, implement PCA. Reconstruct the digit
images and calculate the reconstruction error E(n) = P
j
||xˆj − x||2
for various values of n,
the number of eigenvectors. Plot E(n) versus n.

More products