$29.99
Machine Learning for Signal Processing
(ENGR-E 511; CSCI-B 590)
Homework 4
Instructions
• Submission format: Jupyter Notebook + HTML)
– Your notebook should be a comprehensive report, not just a code snippet. Mark-ups are
mandatory to answer the homework questions. You need to use LaTeX equations in the
markup if you’re asked.
– Google Colab is the best place to begin with if this is the first time using iPython
notebook. No need to use GPUs.
– Download your notebook as an .html version and submit it as well, so that the AIs can
check out the plots and audio. Here is how to convert to html in Google Colab.
– Meaning you need to embed an audio player in there if you’re asked to submit an audio
file
• Avoid using toolboxes.
P1: When to applaud? [5 points]
1. Piano Clap.wav is an audio signal simulating a music concert, particularly at the end of a
song. As some of the audience haven’t heard of the song before, they started to applaud
before the end of the song (at around 1.5 seconds). Check out the audio.
2. Since I’m so kind, I did the MFCC conversion for you, which you can find in mfcc.mat. If you
load it, you’ll see a matrix X, which holds 962 MFCC vectors, each of which has 12 coefficients.
3. You can find the mean vectors and covariance matrices in MuSigma.mat that I kindly provide,
too. Once you load it, you’ll see the matrix mX, which has two column vectors as the mean
vectors. The first column vector of mX is a 12-dimensional mean vector of MFCCs of the
piano-only frames. The second vector holds another 12-dimensional mean vector of MFCCs
of the claps. In addition to that, Sigma, has a 3D array (12×12×2), whose first 2D slice
(12×12) is the covariance matrix of the piano part, and the upper 2D slice is for the clap
sound.
4. Since you have all the parameters you need, you can calculate the p.d.f. of an MFCC vector
in X for the two multivariate normal (Gaussian) distribution you can define from the two sets
of means and cov matrices. Go ahead and calculate them. Put them in a 2 × 962 matrix:
P =
P(X1|C1) P(X2|C1) · · · P(X962|C1)
P(X1|C2) P(X2|C2) · · · P(X962|C2)
. (1)
1
Note that C1 is for the piano frames while C2 is for the applause. Normalize this matrix,
so that you can recover the posterior probabilities of belonging to the two classes given an
MFCC frame:
P˜
c,t =
P(Xt|Cc)
P2
c=1 P(Xt|Cc)
(2)
Plot this 2 × 962 matrix as an image (e.g. using the imagesc function in MATLAB). This
is your detection result. If you see a frame at t
′ where P˜
1,t′ < P˜
2,t′ , it could be the right
moment to start clapping.
5. You might not like the this result, because it is sensitive to the wrong claps in the middle.
You want to smooth them out. For this, you may want to come up with a transition matrix
with some dominant diagonal elements:
T =
0.9 0.1
0 1
. (3)
What it means is that if you see a C1 frame, you’ll want to stay at that status (no clap) in the
next frame with a probability 0.9, while you want to transit to C2 (clap) with a probability
of 0.1. On the other hand, you absolutely want to stay at C2 once you observe a frame with
that label (you don’t want to get back if you start clapping).
Apply this matrix to your P˜ matrix in a recursive way:
P¯
:,1 = P˜
:,1 (4)
b = arg max
c
P¯
c,t (5)
P¯
:,t+1 = T
⊤
b,: ⊙ P˜
:,t+1 (6)
(7)
First, you initialize the first column vector of your new posterior prob matrix P¯ (you have
no previous frame to work with at that moment). For a given time frame of interest, t + 1,
you first need to see its previous frame to find which class the frame belongs to (by using
the simple max operation on the posterior probabilities at t). This class index b is going to
be used to pick up the corresponding transition probabilities (one of the row vectors of the
T matrix). Then, the transition probabilities will be multiplied to your existing posterior
probabilities at t + 1.
In the end, you may want to normalize them so that they can serve as the posterior probabilities:
P¯
1,t+1 = P¯
1,t+1/
P¯
1,t+1 + P¯
2,t+1
P¯
2,t+1 = P¯
2,t+1/
P¯
1,t+1 + P¯
2,t+1
. (8)
Repeat this procedure for all the 962 frames.
6. Plot your new smoothed post prob matrix P¯ . Do you like it?
7. If you don’t like the na¨ıve smoothing method, there is another option, called the Viterbi
algorithm. This time, you’ll calculate your smoothed post prob matrix in this way:
P¯
:,1 = P˜
:,1 (9)
First, initialize the first frame post prob as usual. Then, for (t+ 1)-th frame, we will construct
P¯
c,t. For this, we need to see t-th frame, but this time we check on all paths to pick up the
best one by calculating all the probabilities of state transitions from c at t to c
′ at t + 1. For
example, from (c, t) to (c
′ = 1, t + 1):
b = arg max
c
T c,1P¯
c,t. (10)
This will tell you which of the two previous states c = 1 and c = 2 at t was the best transition
to the current state C1 at t + 1 (we’re seeing only C1 for now). If b = 1, it’s more probable
that the previous state at t was C1 rather than C2. We keep a record in our B matrix for this
best choices:
B1,t+1 = b (11)
Eventually, we use this best transition probability to construct the post prob of C1 at t + 1:
P¯
1,t+1 = T b,1P¯
b,tP 1,t+1 (12)
(13)
In other words, the prior probability (the transition probability and the accumulated post
prob in the t-th frame), T b,1P¯
b,t, is multiplied to the likelihood P 1,t+1 to form the new post
prob.
We keep this best previous state sequences. For example, B1,102 will hold the more likely
previous state at 102-th frame if that frame’s state is C1, while B2,102 records its corresponding
101-th state that could have transit to C2.
8. Don’t forget to repeat this for P¯
2,t+1.
9. Don’t forget to normalize P¯ as in (8).
10. Once you construct your new post prob matrix from the first frame to the 962-th frame, you
can back track. Choose the larger one from P¯
1,962 and P¯
2,962. Let’s say P¯
2,962 is larger.
Then, refer to B2,962 to decide the state of the 961-th frame, and so on.
11. Report this series of states as your backtracking result (e.g. as a plot). This will be the hidden
state sequence you wanted to know. Is the start of the applause making more sense to you?
P2: Multidimensional Scaling [3 points]
1. MDS pdist.mat holds a 4808×4808 square matrix, which holds squared pairwise Euclidean
distances that I constructed from a set of data points. Of course I won’t share the original
data samples with you. Instead, you’ll write a code for multidimensional scaling so that you
can recover the original map out of L. If your MDS routine is successful, you should be able
to see what the original map was. The original map was in the 2D space, so you may want
to see two largest eigenvectors. Report the map you recovered in the form of a scatter plot
(dots represent the location of the samples).
3
2. Note that the MDS result can be rotated and shifted.
P3: Kernel PCA [4 points]
1. concetric.mat contains 152 data points each of which is a 2-dimensional column vector. If
you scatter plot them on a 2D space, you’ll see that they form the concentric circles you saw
in class.
2. Do kernel PCA on this data set to transform them into a new 3-dimensional feature space,
i.e. X ∈ R
2×152 ⇒ Kernel PCA ⇒ Z ∈ R
3×152
.
3. In theory, you have to do the centering and normalization procedures in the feature space,
but for this particular data set, you can forget about it.
4. You remember how the after-transformation 3D features look like, right? Now your data set
is linearly separable. Train a perceptron that does this linear classification. In other words,
your perceptron will take the transformed version of your data (a 3-dimensional vector) and
do a linear combination with three weights w = [w1, w2, w3]
⊤ plus the bias term b:
yi = σ
w⊤Z:,i + b
, (14)
where yt is the scalar output of your perceptron, σ is the activation function you define. If
you choose logistic function as your activation, then you’d better label your samples with 0
or 1. You know, the first 51 samples are for the inner circle, so their label will be 0, while for
the other outer ring samples, I’d label them with 1.
You’ll need to write a backpropagation algorithm to estimate your parameters w and b, which
minimize the error between yi and ti
. There are a lot of choices, but you can use the sum of
the squared error: P
i
1
2
(yi − ti)
2
. Note that this dummy perceptron doesn’t have a hidden
layer. Note that you may want to initialize your parameters with some small (uniform or
Gaussian) random numbers centered around zero.
5. Your training procedure will be sensitive to the choice of the learning rate and the initialization
scheme (the range of your random numbers). So, you may want to try out a few different
choices. Good luck!
P4: Stereo Matching (revisited) [4 points]
1. You remember you did the GMM-based stereo matching, right? If you forgot, revisit Homework 2 Problem 3 and Problem 4.
2. Extend your implementation with the MRF’s smoothing priors using an eight neighborhood
system (e.g. Ni,j =
(i − 1, j − 1),(i − 1, j),(i − 1, j + 1),(i, j − 1),(i, j + 1),(i + 1, j − 1),(i +
1, j),(i+ 1, j + 1)
. Feel free to choose either ICM or Gibbs sampling. Show me the smoothed
results. You can use the Gaussian-kernel-looking prior probability equations discussed in class
(L14 S8).