$29.99
Machine Learning for Signal Processing
(ENGR-E 511; CSCI-B 590)
Homework 5
Instructions
• Submission format: Jupyter Notebook + HTML)
– Your notebook should be a comprehensive report, not just a code snippet. Mark-ups are
mandatory to answer the homework questions. You need to use LaTeX equations in the
markup if you’re asked.
– Google Colab is the best place to begin with if this is the first time using iPython
notebook. No need to use GPUs.
– Download your notebook as an .html version and submit it as well, so that the AIs can
check out the plots and audio. Here is how to convert to html in Google Colab.
– Meaning you need to embed an audio player in there if you’re asked to submit an audio
file
• Avoid using toolboxes.
P1: Probabilistic Latent Semantic Indexing (PLSI) for Speech Denoising
[3 points]
1. Convert the two training signals trs.wav and trn.wav using the same STFT setup with
Homework #3 P2.
2. Like you did in Homework #3 P2, build a speech denoising system, but by using the PLSI
algorithm rather than NMF.
3. Report the SNR value of the separation results for the test signal tex.wav.
P2: Optimal K for PLSI [4 points]
1. It is known that there are different ways to express one’s feelings using emoticons depending on
the culture. For example, in Korea, where I’m from, we use double circumflexes to represent
a smiley face, e.g., (ˆˆ). On the other hand, an angry face can be represented by (`´). Note
that lips are not necessary.
2. I found though in the western culture people recognize someone’s feeling via the lip shapes,
e.g., :) or :(. I have no problem with these emoticons except that they are rotated by 90
degrees (kinda weird to me). After realizing this difference, in my real life, I’m trying to smile
by moving my lips instead of using eyes to better communicate my friends from the western
culture.
1
3. faces.npy contains eight different human faces, each of which is a vectorized 2D array. If
you reshape one of the 441 dimensional vectors into a 21 × 21 2D array, and then display it
(e.g., using the imshow function), you will see a picture. Draw all eight faces in this way and
include in your solution.
4. While there are eight faces in this dataset, you will see that there are a smaller number of
latent variables that are combined to “make up” one’s face. For example, I wouldn’t say the
eyes and nose are one of the effective latent variables, because all have the same eyes and nose
in this dataset—they all share the same eyes and nose and there’s no variation.
5. To identify those more effective latent variables instead, you may want to figure out the
combination of different parts of faces to reconstruct a face and examine all faces. What are
the unique components that make up all the human faces effectively? This will define the
number of latent variables K.
6. Based on your guess on the number of latent variables K, train a topic model. Don’t worry
about using LDA, just a PLSI should work well. I recommend the first set of EM equations
in M11-S18. PLSI will give you two matrices B ∈ R
441×K and Θ ∈ R
K×8
. Since they are
from “probabilistic” topic modeling, P441
f=1 Bf,k = 1 for any choice of k and PK
k=1 Θk,t = 1
for any choice of t.
7. Draw your K basis images. Reshape each of B:,k back into a 21 × 21 2D array and show it as
an image. Repeat it K times. That K images will show what the underlying face components
are to make up the database, if your K is correct.
8. Draw Θ as an image. Its column vector Θ:,t will tell you the probability of K basis images
as to how much they contribute to reconstruct the t-th face.
9. Draw your reconstructed facial images, i.e., X ≈ Xˆ = BΘ. Again, reshape each of Xˆ
:,t back
into a 21×21 2D array and show it as an image. Repeat it 8 times. These eight reconstructed
images should be near-perfectly similar to those from X.
10. Note that the order of basis images can be shuffled, although it won’t affect your solution.
The resulting images should be correct to receive full points.
P3: PLSI for Analyzing Twitter Stream [4 points]
1. twitter.mat holds two Term-Frequency (TF) matrices Xtr and Xte. It also contains YtrM at
and YteM at, the target variables in the one-hot vector format.
2. Each column of the TF matrix Xtr can be either “positive”, “negative”, or “neutral”, which
are represented numerically as 1, 2, and 3 in the YtrM at. They are sentimental classes of the
original twits.
3. Learn 50 PLSI topics B ∈ R
891×50 and their weights Θtr ∈ R
50×773 from the training data
Xtr, using the ordinary PLSI update rules.
4. Reduce the dimension of Xte down to 50, by learning the weight matrix Θte ∈ R
50×193. This
can be done by doing another PLSI on the test data Xte, but this time by reusing the topic
matrix B you learned from the training set. So, you skip the update rule for B. You only
update Θte ∈ R
50×193
.
2
5. Define a perceptron layer for the softmax classification. This part is similar to the case with
kernel PCA with a perceptron as you did in Homework #4 Problem 3. Instead of the kernel
PCA results as the input to the perceptron, you use Θtr for training, and Θte for testing.
This time the number of output units is 3 as there are three classes, and that’s why the target
variable YtrM at is with three elements. Review M6 S37-39 to review what softmax is.
6. Report your classification accuracy.
P4: Rock or Metal [5 points]
1. trX.mat contains a matrix of size 2 × 160. Each of the column vectors holds “loudness” and
“noisiness” features that describe a song. If the song is louder and noisier, it belongs to the
“metal” class, and vice versa. trY.mat holds the labeling information of the songs: -1 for
“rock”, +1 for “metal”.
2. Implement your own AdaBoost training algorithm. Train your model by adding weak learners.
For your m-th weak learner, train a perceptron (no hidden layer) with the weighted error
function:
E(yt||yˆt) = wt(yt − yˆt)
2
, (1)
where wt is the weight applied to the t-th example after m − 1-th step. Note that ˆyt is the
output of your perceptron, whose activation function is tanh.
3. Implementation note: make sure that the m-th weak learner ϕm(x) is the sign of the perceptron output, i.e. sgn(ˆyt). What that means is, during training the m-th perceptron, you use
yˆt as the output to calculate backpropagation error, but once the perceptron training is done,
ϕm(xt) = sgn(ˆyt), not ϕm(xt) = ˆyt.
4. Don’t worry about testing the model on the test set. Instead, report a figure that shows the
final weights over the examples (by changing the size of the markers), as well as the prediction
of the models (giving different colors to the area). I’m expecting something similar to the
ones in M12 S26.
5. Report your classification accuracy on the training samples, too.
3