$30
Problem Set 2: Graph Clustering and Signal Processing
Unsupervised Learning for Big Data
CPSC 453 / CBB 555 / CPSC 553 / GENE 555
1 Introduction
In this assignment we will explore graph clustering and signal processing, two common paradigms for analyzing data that can be represented as a graph. In our lectures we learned that we can induce a graph
from any high dimensional dataset using a Gaussian kernel; thus, these graph based methods are generally
applicable to any high dimensional dataset.
Clustering is a key method to understand and unravel heterogeneity in high dimensional data. In general,
cluster analysis must start with a set of assumptions on the data. For instance, with k-means, the assumption
is that our clusters are spherical and roughly the same size. On the other hand, agglomerative methods like
Louvain do not have such restrictions on geometry, but assume that we wish to approximately maximize some
target function (such as modularity). Louvain is a graph-based agglomerative method; spectral clustering,
on the other hand, is related to k-means in the sense that it is a partition based method. However, unlike
k-means, spectral clustering uses the graph Laplacian eigenvectors as a set of coordinates. This means that
clusters are formed using graph geometry. Clustering has been used in a variety of biological applications
including gene clustering, cell subtype finding, patient type identification, etc.
Separately, graph signal processing has emerged in recent years as a tool for manipulating data that rests
on a graph. A major goal of this field is to take some of the major advances in classical signal processing
and apply them to graphs. In lecture, we learned about the Graph Fourier Transform, which allows us to
study the behavior of data in the frequency space, which encodes geometric information and the behavior of
neighborhoods. In this assignment, we’ll see that these ideas are intricately linked to the ideas of spectral
clustering.
2 Preliminaries
In order to understand spectral clustering, it is useful to first understand some key features of graph Laplacian
spectra. In class, we discussed the combinatorial graph Laplacian
L
c = D − W, (1)
and the normalized graph Laplacian
L
s = D− 1
2L
cD− 1
2 , (2)
where D is the degree matrix for a graph on N vertices with weights W, i.e. D = diag(d(i)), d(i) = PN
j=1 Wij .
In the following discussion, I will specify general Laplacians as L whereas vectors and matrices related to a
specific normalization will feature a superscript.
We refer to the eigenbasis of the Laplacian L = ΨΛΨT as the Fourier basis. The eigenvectors form a
basis for frequency analysis.
1
2.1 Ordering the Laplacian Basis
Abusing notation, let’s write the set of eigenpairs as B = (Ψ,Λ) where Ψ = {ψi}
N
i=1 and Λ = {0 = λ1 ≤
λ2 ≤ · · · ≤ λN }. Note that our eigenvalues are ordered in reverse from the diffusion map Ms = ΦχΦ
T
eigenvalues χ = {1 = µ1 ≥ µ2 ≥ · · · ≥ µN }. This ordering allows us to retain classical notions of “high”
and “low” frequency. The smoothness of our Fourier basis functions corresponds to their eigenvalue. The
ordering holds for both L.
2.2 Graph Fourier Transform
We will compute the orthonormal expansion of signals in the Fourier basis of our graph. The Graph Fourier
Transform gives us this representation. For a signal s ∈ R
N defined on the vertices of a graph, we have
sˆ(λl) = hs, ψli, (3)
so
sˆ = ΨT
s. (4)
As in the classical Fourier transform, we have a unitary transformation here so we can perform the Inverse
Graph Fourier Transform as
s = Ψˆs. (5)
Throughout this problem set we’ll examine this frequency-based representation of input signals such as
cluster labels. A cluster label signal for cluster Ci has a 1 on each node that is a member of Ci and 0
otherwise.
2.3 Laplacian Spectra
We should observe a few things about the eigensystems of L. We may refer to the distribution of the
graph Laplacian eigenvalues as the spectrum of a graph, in the same way we refer to the distribution of
frequencies as a spectrum. The meaning of the eigenvalues of L (i.e., how they relate to graph properties
like connectivity) is the key subject of the field of Spectral Graph Theory. Many of the results in Spectral
Graph Theory are related to very specific graph constructions, so in general it is very hard to describe the
spectrum of an arbitrary graph like one we consider. That said, some applicable key results revolve around
the magnitude of λ2 and its relationship to connectivity and mixing times. Others exist, but we will not
cover them in this course. It is worth noting that many of the design choices for our algorithms revolve
around constructing a graph that has certain properties (adaptive bandwidth, positive semi definite kernels)
or manipulating a graph spectrum in specific ways. Let’s make a few easy derivations that are good things
to know for all graphs.
The normalized Laplacian L
s and the combinatorial Laplacian L
c do not share eigenvectors or eigenvalues.
Let’s focus on the combinatorial Laplacian first. The easiest observation to make is that the first eigenvector
of L
c
is the all ones vector, ψ
c
1 = 1. This eigenvector has eigenvalue λ
c
1 = 0. To derive this, note that the row
sums of L
c are 0 and multiply L
c1. Beyond λ
c
1
, the magnitude eigenvalues of the combinatorial Laplacian
may scale with degree, though many other properties change the way they distribute.
For the normalized Laplacian, we have the first eigenpair {λ
s
1
, ψs
1} = {0,
d
1/2
kd1/2k
}. Because the normalized
Laplacian has a degree normalization (the sandwiched inverse square roots), its eigenvalues fall in the interval
0 ≤ λi ≤ 2 for all i where λN = 2 if and only if the graph is bipartite. Earlier we mentioned the Diffusion
Map. It turns out that these matrices share eigenvectors and their eigenvalues are easily related. Let
Ms = D− 1
2 W D− 1
2 (6)
we have
L
s = I − D− 1
2 W D− 1
2 (7)
= I − Ms
(8)
2
So λ
s
i = 1 − µi
, and φi = ψi
. If we make a normalization or choose W to be positive semi definite, then it is
obvious that 0 ≤ λi ≤ 1.
Finally, I want to note that the number of zero eigenvalues for any L encodes the number of connected
components, i.e. a disconnected graph of k components has k = |{λi
: λi = 0}|. The corresponding
eigenvectors serve as indicators for each disconnected component. To see this, note that the columns and
rows of any disconnected Laplacian can be sorted to yield a block diagonal matrix. Thus, the full Laplacian
eigensystem is the union of the block Laplacian eigensystems. We will use this in spectral clustering. A
fuzzy but useful observation from this is that graphs that are nearly disconnected will have low frequency
eigenvectors that have indicator-like properties. This should lead you to some clues about the Fourier
coefficients of cluster labels.
To begin, we’ll first code up a platform for our synthetic example
3 Filtering Signals on the Stochastic Block Model
1. One of the most common synthetic examples for evaluating clustering algorithms is the Stochastic
Block Model (SBM). This model is entirely parameterized by
(a) The number of points N
(b) The number of clusters and cluster size
(c) The ratio of edge probabilities between and within each cluster, =
pij
pii
.
Begin by generating the synthetic SBM dataset with the following steps:
• Fill in the function
sbm(N, k, pij, pii,sigma):
...
return A, gt, coords
• The function will take as input the number of points N, the number of clusters k, and two floats
that encode the probability of forming an edge from one cluster to another pij, and the probability
of forming an edge within a cluster pii.
• To make the adjacency matrix, generate an N × N matrix and loop through each entry. If
the entry connects two points within the same cluster, sample from a uniform distribution and
compare to pii to generate binary values determining whether an edge should exist (and likewise
if they are from different clusters). You might find some clever ways to speed up this process (e.g.
by exploiting A’s symmetry; or by building a matrix of probabilities, another of uniform samples,
and using whole matrix operations).
• For ground truth gt, you’ll want to uniformly partition your N points such that you have an N ×1
vector that encodes which cluster they belong to. If the number of clusters does not evenly divide
the number of points, distribute the remaining points in order to the lowest numbered clusters
(i.e. if there are 5 leftover points and 19 clusters, give one point each to clusters 1 through 5).
• To generate coordinates for each cluster, use 2-dimensional normal distributions with means
equally spaced around the unit circle. Use the variance encoded by sigma.
2. Build a Laplacian function
L(A, normalization):
...
return L
• The function will take as input a graph adjacency matrix A and return a Laplacian matrix L.
3
• If normalization = False then use equation 1. Otherwise, use equation 2.
3. Build a function for computing the Fourier basis
compute_fourier_basis(L):
...
return e,psi
• The function will take as input a graph Laplacian matrix L and return its Fourier basis e, psi.
• You can use eigh to obtain this eigensystem.
4. Build a function for computing the graph Fourier transform (GFT)
gft(s, psi):
...
return s_hat
• The function will take as input a graph signal s and a Laplacian Fourier basis psi. It will return
the GFT of the signal s hat.
• To compute this, take the inner product ˆs(i) = hs, ψii
5. Build a function for filtering signals:
(a) We’ll assume that we are working with exact filters. This means that we already have a precomputed Fourier basis, and we don’t need to form an approximation. It turns out that in practice
this is slow, and we can actually do approximations to avoid the diagonalization of the Laplacian.
However, it’s conceptually easiest to build exact filters.
(b) Define a graph filter as some function h of the Laplacian eigenvalues Λ, such that h(Λ) =
diag (h(λ1), · · · h(λn)) and h(λi) ∈ [0, 1). This is simply a polynomial that we plug our eigenvalues into, with the restriction that our values range from 0 to 1. Then we can define the
vertex-domain filter matrix as
H := Ψh(Λ)ΨT
. (9)
If we want to filter some s, we write Hx as the filtered version. Let’s interpret this in terms of
the GFT by writing
Hs = Ψh(Λ)ΨT
s = Ψh(Λ)ˆs.
We’re weighting ˆs by h(Λ), then taking the inverse Fourier Transform (i.e. for some function
frequency valued function ˆf, f = U ˆf.
filterbank_matrix(psi, e, h):
...
return H
• The function will take as input a Laplacian Fourier basis psi, e and a filter h. h must be a
function that accepts eigenvalues and returns a value in the range of 0 to 1. The function will
return the filter bank matrix from equation 9.
• First, evaluate the filter at each eigenvalue. Then take the product from equation 9.
• If you want to filter signals, simply take Hs
In class, we discussed the notion that the eigenfunctions of the Laplacian capture the smoothness of
signals. Let’s take a look at the cluster indicators from our SBM.
4
• Generate a set of SBMs with various values of pij . Start with pij = 0 and work in small increments
until the graph is highly connected between clusters. Use 500 points and 8 clusters.
1. Use plt.imshow to visualize the adjacency matrix of your SBMs
2. Plot the absolute value of the Fourier transform of your ground truth cluster labels for each SBM
realization.
To do this, you should one-hot encode your cluster signals into a matrix of dimension N ×K,
where the i
th column of the matrix has ones in those entries which are part of the i
th cluster, and
zeros elsewhere. (For example, if your ground truth labels are [0,0,1,1,1], the one hot encoding
should be [[1,0],[1,0],[0,1],[0,1],[0,1]]. (There is a clever way to produce this encoding with a single
line of code!) You can then take the GFT of the one-hot encoded cluster labels and can plot each
column using plt.stem. Make sure that your x-values correspond to the graph frequencies (i.e.
eigenvalues).
3. At each realization, plot the coordinates of the SBM colored by the second eigenvector of the
graph Laplacian.
4. Use your filter function to filter random Gaussian noise. Try low pass filters which are ”ideal”,
that is, if λ < c, the filter evaluates to 1. Otherwise, the filter evaluates to 0. Plot the Fourier
spectra of a few realizations of Gaussian noise, and plot the filtered noise on the data coordinates.
Question 3.1. How does the spectrum of the cluster labels changing at various levels of connectedness?
What kind of frequency content does a stable clustering have?
Question 3.2. What do the first few Laplacian eigenvectors represent in the SBM? Why?
Question 3.3. What does the filtered noise look like on average? What happens when you vary the parameter
c?
4 Filtering Signals on the Swiss Roll
Now, let’s generate some signals on the Swiss roll and filter those.
• Using the Swiss roll from Assignment 1, build a graph using the Gaussian kernel parameters that
produce a reasonable diffusion map (i.e. it looks like a plane). Use this diffusion map to visualize the
signals you generate below.
1. Use the coloring from the previous problem set to generate 3 signals: one low frequency, one
medium frequency, one high frequency. Plot these signals and show their graph Fourier transform.
Report the function that you used to generate them. Hint: Trigonometric functions work well
here.
2. Try shifting the phase of your signals. Does it work? You could do this by shifting your trig
functions along the color vector.
3. Use your filter function to filter these signals. Try low pass filters which are ”ideal”, that is, if
λ < c, the filter evaluates to 1. Otherwise, the filter evaluates to 0. Similarly try ”ideal” high
pass filters. Plot the result of two of your favorite filters for your three signals.
4. Craft a band pass filter using a Gaussian function, like e
−
(λ−µ)
2
2σ2 . You can change the mean µ
of the Gaussian to a desired target λk in the middle of the band. You can tune the width by
changing σ.
Now, define the Kronecker Delta as
δi(j)
(
1 if j = i
0 else.
The Kronecker Delta is an n × 1 vector filled with zeros, but with a one in the i
th entry. For
example, if we set i = 3 with n = 5, then δ3 = (0, 0, 1, 0, 0).
5
Figure 1: A band pass filter in the frequency domain, centered at 0.3.
Use the Kronecker delta as a signal to translate your Gaussian to a certain point on the graph.
You can do this by taking Hδi
. Changing the value of i should move the Gaussian to different
parts of the graph. Translate it over the Swiss roll and note what changing λk does to the output.
Note: In the classical setting, we might call this a Gabor filter.
5. Plot all of your filters by evaluating them over the interval [0, λN ). You can use
np.linspace(0,lmax,1000) where lmax is the largest eigenvalue of the Laplacian.
Question 4.1. What does smoothness mean in terms of graph signals and their frequency spectrum?
Question 4.2. Band-limiting is a nice trait if one wants to design an algorithm. It means that there is
no frequency content above a certain frequency, i.e. the band limit. For what values of pij are SBM cluster
labels band-limited?
Question 4.3. Under what scenarios would you want a band pass or high pass filter? How do these filters
work and what do they do to the spectrum of the signals?
Question 4.4. (Bonus) Are there any similarities between the band pass experiment and classical translation
and modulation?
I highly recommend the works of [1, 2, 3] for more insights into what we’re doing here.
5 k-means Clustering
In the following, you will implement k-means using the k-means++ initialization.
1. Fill in the k-means function
kmeans(X, k, nrep=5, itermax=300):
init = kmeans_plusplus(X,k)
...
return labels
• K-means works by iteratively selecting cluster labels that are closest to a point.
• That is, at each point i, select the centroid of the nearest cluster (use squared euclidean distance)
• After you have reassigned all points based on their nearest centroid, you then update all the
centroids based on their new members.
• Repeat the process until the cluster assignments stop changing, within some small tolerance.
• If the algorithm doesn’t converge, you’ll want to terminate it when you reach itermax iterations.
6
• To gain accuracy, you’ll want to take nrep repetitions and choose the cluster assignments that
have the smallest within cluster distance to centroid.
2. To initialize k-means, you’ll use the k-means++ initialization. Fill in this function
kmeansplusplus(X, k):
...
return centroids
• This algorithm has been shown to be a relatively good way to initialize k-means
• To start, choose a single initial point at random to create the first centroid. Let’s call this c1.
• For each remaining point, compute the squared distance to the nearest centroid and make it a
probability distribution p(xj ) = minci D2
P
(ci,xj )
k minci D2(ci,xk)
, where the probability of choosing point xj is
the distance from xj to the nearest centroid, divided by the total distance of all points to their
nearest centroid. For the first iteration, this will be the distance from each point to the first
centroid c1, divided by the total distance from each point to c1.
• Sample an xj with probability proportional to p(xj ). You can use the np.random.choice function,
using your vector of probabilities as the weights. The weighted selection will be the next centroid,
c2.
• Repeat this process until you have k initial centroids.
You will first test this on a Gaussian mixture model with K components. Points here are distributed about
K means with variance X. To generate this GMM, we’ll just use your sbm function to generate coordinates,
discarding the weighted adjacency matrix returned by the SBM.
1. Run k-means on your GMM with various values of sigma.
2. Next, generate a concentric spherical dataset by sampling 1000 points from a 3d unit normal distribution. Normalize all points using Euclidean distance. This should form a hollow sphere. Multiply the
first 500 points by 10. This should create concentric spheres. Each sphere will be a separate cluster.
Run k-means on this.
Question 5.1. Compare the labels you obtain on the SBM to the ground truth clusters. At what values of
sigma does k-means clustering start to fail?
Question 5.2. Does k-means converge to a reasonable clustering on the concentric spheres? Why?
6 Spectral Clustering
Next we will compare k-means to spectral clustering. Spectral clustering works by treating the eigenvectors
of L as coordinates for your points. We will use the method presented in [4]. This is a useful tutorial [5].
Spectral clustering proceeds as follows:
1. Build or obtain a graph
2. Construct the normalized graph Laplacian
3. Take the first k eigenvectors
4. Normalize the rows of these eigenvectors using the `2 norm
5. Run k-means on this normalized output
Fill in the function
7
SC(L, k, psi=None, nrep=5, itermax=300, sklearn=False):
...
return labels
Your code will take in a normalized graph Laplacian L, the desired number of clusters k, and optionally
the Laplacian eigenbasis psi. If psi is not provided, you will compute the first k eigenvectors of the graph
Laplacian. You can use scipy.linalg.eigh for this.
As with k-means before, we will repeat the clustering nrep times to get a more robust clustering. You
can just pass this parameter and itermax through to kmeans. The sklearn flag can be used to check your
clustering in the absence of your k-means implementation. Consider it a nice performance metric and a
sanity check.
Next, let’s repeat our previous k-means experiment but using spectral clustering:
1. Run spectral clustering on your SBM with various values of sigma and pij .
2. Next, generate a concentric spherical dataset by sampling 1000 points from a 3d unit normal distribution. Normalize all points using L2 distance. This should form a hollow sphere. Multiply the first
500 points by 10. This should create concentric spheres. Each sphere will be a separate cluster. Use a
Gaussian kernel to generate a graph for these spheres. Run spectral clustering on this.
3. Generate multiple instantiations of Gaussian noise on the concentric spheres (vary the variance of the
noise) and filter them using an ideal low-pass filter.
4. Finally, use your SBM to generate coordinates (as in regular k-means), but generate an adjacency
matrix for these points using the Gaussian kernel (This concoction is called a Gaussian Mixture Model).
Perform spectral clustering on this at various bandwidths.
5. Compare these results to the ones you found in our initial k-means experiment.
Question 6.1. Compare the labels you obtain on the SBM to the ground truth clusters. At what values of
sigma and pij does spectral clustering start to fail?
Question 6.2. How does spectral clustering compare to k-means on the SBM? How does using the SBM
adjacency matrix compare to creating one from a Gaussian kernel?
Question 6.3. How does spectral clustering compare to k-means on the concentric spheres?
Question 6.4. How does the kernel width affect the output of the clustering?
Question 6.5. Compare the cluster labels of spectral clustering to the filtered noise on the concentric spheres.
What is the frequency content of the label vector? Discuss the connection between Spectral clustering and the
Graph Fourier Transform.
Question 6.6. In the case that you don’t have some pathological geometry (i.e. the concentric spheres),
when would you use spectral clustering?
7 Using Louvain, and trying out PHATE and tSNE
We’ve now performed several clustering methods, but only in the convenient case in which our data has
a dimensionality low enough to be visualized. In this section, we’ll experiment with tSNE and PHATE,
methods that can embed any dataset into a nice visualization. To test these, we’ll try them on the datasets
we’ve been using – whose real visualizations we know. We’ll also experiment with one last clustering method
— Louvain — and will use tSNE and PHATE visualizations to directly observe the effects of tweaking
Louvain’s hyperparameters.
1. First, run Louvain on each of our past examples. Try it on the SBM graphs for a few values of pij .
Experiment with Louvain on the GMM, with different sigmas. Lastly, try Louvain on the concentric
spheres.
8
You can use the python-louvain package available from PyPI (e.g. with pip install python-louvain).
The documentation is available here:
https://python-louvain.readthedocs.io/en/latest/api.html.
Note that to use this package, you must convert your numpy adjacency matrix into a NetworkX graph
(see https://networkx.org/documentation) For a theoretical overview of Louvain, you can read the
paper which introduced it [6], available at https://arxiv.org/abs/0803.0476.
2. Visualize your Louvain clusterings by coloring scatterplots of each model with Louvain’s cluster labels,
as previously. Try varying Louvain’s resolution parameter, while observing the effects on your colored
scatterplots.
3. Using tSNE1
, visualize the best clustering you found with Louvain. You can use the scikit-learn
implementation of tSNE. Create at least six visualizations by varying the parameters of tSNE to find
the optimal representation of Louvain’s clusters. You can find a nice explanation of tSNE’s parameters
here:
https://distill.pub/2016/misread-tsne/.
4. Now, visualize the Louvain clusters using PHATE2
. The PHATE package is also available on PyPI (use
pip install phate), and its usage instructions are here: https://phate.readthedocs.io/en/stable/.
As before, create at least six visualizations of PHATE with different parameters.
Question 7.1. How does Louvain compare to the output of the two previous clustering methods we tried
(spectral and regular k-means)?
Question 7.2. What is the goal of Louvain?
Question 7.3. How does Louvain perform on SBM at various pij and pii?
Question 7.4. What effects did you observe when varying the parameters of tSNE? Why do you think these
changes occurred? Did you find an optimal set of parameters for visualization, or did this depend on the
specific clusters?
Question 7.5. What effects did you observe when varying the parameters of PHATE? Why do you think
these changes occurred? Was there an optimal set of parameters for all visualizations?
Question 7.6. What differences did you notice between tSNE and PHATE? Theoretically, what do you think
accounts for them?
Question 7.7. Why is the Gaussian mixture model a good data set to use to test the clustering algorithm?
Question 7.8. What modifications to the mixture model would you make to further test the clustering
algorithms?
Question 7.9. What are other synthetic data sets you would use to test clustering algorithms?
8 Retinal Bipolar dataset
In this final installment, we’ll turn away from synthetic datasets and enter the much messier world of
single-cell RNA sequencing. Our subject is the Retinal Bipolar dataset of Shekhar et al. (2015), which
includes the markers of 21,000 bipolar cells of the mouse retina. These retinal bipolar cells receive input
from the rod and cone photoreceptors of the retina, process it, and pass it onward to the brain. These
neurons have been categorized according to their location (whether they receive input from rods or cones),
their function (whether their activation increases or decreases with increasing light), and their molecular
properties, but these distinctions do not always agree with other. Indeed, significant debate exists in the
biological community over what qualifies as a class — which provides a perfect entrance for unsupervised
learning.
1For a detailed introduction, refer to [7]
2For the theory and background of PHATE, refer to [8]
9
Using a variant of the Louvain method combined with prior knowledge of known clusters, Shekhar et
al. identified 15 different classes of retinal bipolar cells [9]. In this section, you’ll apply your newly honed
clustering skills to verify this classification.
(More about the Retinal Bipolar dataset may be found in Shekhar’s paper at
https://www.cell.com/cell/fulltext/S0092-8674(16)31007-8)
1. Load the Retinal Bipolar dataset and its metadata from the corresponding files retinal-bipolar-data.pickle
and retinal-bipolar-metadata.pickle in the data folder. You should import the pandas data storage library to do this, as you can then use the command pandas.read pickle(‘file-path’). (Note
that the metadata actually contains 26 cluster labels, though Shekhar only found 15 of them to be
biologically significant.)
2. The unaltered dataset has over 15,000 columns corresponding to 15,000 markers for each of the 21,000
cells. To make the computations feasible on your laptop, we’ll apply PCA to the columns of the dataset,
and project each cell onto just the first 100 PCA components. You may wish to use the command
scprep.reduce.pca from the scprep library.
3. 21,000 cells is still quite a few, so we’ll subsample down to 3000 cells. You can use the function
scprep.select.subsample to randomly select 3000 cells and their corresponding markers from from
the dataset and its metadata.
4. Build a graph from the dimensionally-reduced data using an adaptive Gaussian kernel with k = 10.
5. Cluster the data using k-means, Spectral Clustering and Louvain.
6. Visualize the dimensionally-reduced data using PHATE, coloring the points according to each of the
above clusterings. Produce similar visualizations using tSNE. What differences do you notice between
the visualization techniques with this dataset?
7. Try varying the kernel parameter k to obtain different cluster assignments. Additionally vary the
Louvain parameters. Visualize each of these new clusterings with a recolored PHATE plot.
8. Plot the second and third eigenvectors of the Graph Laplacian. Color the plot by your cluster assignments.
9. Try coloring the PHATE plot using different channels in the data (e.g. different columns of the
metadata matrix). Can you find any channels that seem to correspond to your cluster assignments?
10. Apply an ideal low-pass filter to some of the channels you found above, treating the gene expression
as a signal over the graph. Rerun your visualizations using these new, filtered channels to color the
PHATE plot. How much has changed? Do any of the denoised channels better represent the true
clustering?
11. Binarize the kernel such that it is no longer a weighted graph (i.e., if a value is greater than some
threshold, it is 1, otherwise, it is 0). Try rerunning Louvain with this binarized kernel.
12. Finally, compare your clusters to those of Shekhar et al. by coloring your PHATE visualizations with
the cluster number provided in the metadata file. Describe any relationships you notice between these
clusters and the clusters you’ve obtained with k-means, Louvain, and Spectral Clustering.
Question 8.1. How many clusters did Louvain produce? How many clusters did PHATE and tSNE suggest?
To what extent did different clustering methods parameters affect this number?
Question 8.2. In a biological setting, how would you interpret the different clusters? More specifically, how
would you interpret any variation you noticed from the 15 clusters described by Shekhar et al?
Question 8.3. What did you notice when you plotted the eigenvectors of the Graph Laplacian?
Question 8.4. What is the effect of low-pass filtering a feature on the graph? In what context could this be
useful for data analysis?
Question 8.5. How does binarization of the kernel affect your clustering? Why might this be the case?
10
9 Grading Rubric
9.1 Implementation – 20 points
Your implementation will be graded by running your submitted ps2 functions.py on several test cases. It
is very important that you do not change the function input / outputs provided in the skeleton file. If
you do, then your functions will not properly handle the test cases and you may receive no credit for the
implementation portion of the assignment. Please adequately comment your code so that partial credit may
be assigned in the event that your code does not pass all test cases.
9.2 Report – 80 points
You must write a detailed report about the experiments that you have run for this homework, including
all plots and visualizations. Your report should address the questions raised here, but can include extra
discussion beyond the scope of these questions. Feel free to add any interesting insights that you had or
extra experiments / validations you ran.
10 Rules and Guidelines
10.1 Allowable code libraries
You are encouraged to use routines available in numpy, scipy and matplotlib for computation and plotting.
You are required to use scikit-learn for tSNE and for the sklearn=True implementation of k-means,
and both networkx and python-louvain for Louvain clustering. You may also use the scprep library for
data preprocessing. Please do not use scikit-learn outside of the specific use cases denoted above. Use of
other external libraries is prohibited unless otherwise stated, though if there is a library you would like to
use you should ask.
10.2 Extensions
Extensions for graded work will not be granted any later than 24 hours before the assigned deadline. Any
unexcused late submission will be penalized 10% per day or partial day, with the penalty incrementing at
12:00 midnight each day after the deadline.
10.3 Contesting a grade
Students wishing to contest a grade should submit a request in writing no less than 24 hours after the grade
is submitted. Students should first request an explanation of the grade on a particular question from the
TF. If, after explanation, a student still wishes to contest, the question will be regraded by the primary
instructor.
10.4 Academic Integrity
Please familiarize yourself with the “Definitions of Plagiarism, Cheating and Documentation of Sources”
section of the Yale College Undergraduate Regulations. Plagiarism of any kind will not be tolerated in this
course. Students are encouraged to discuss general ideas with each other, but must write code and carry
out experiments individually. However, the course staff (professor, TA, etc.) is happy to provide assistance
in all aspects of the assignment, from coding difficulties to interpretation of experiments. Any plagiarism –
whether copying code or sharing experimental analyses – will not be tolerated.
11 Submission Instructions
Each student should submit a zip file titled [last name] [first name] ps2.zip to Canvas by Friday,
November 5nd, 11:59pm containing
11
1. A detailed write-up in pdf form, titled [last name] [first name] ps2 report.pdf, containing figures, responses to questions, and optionally the code used to produce the figures.
2. A subdirectory titled code containing your complete ps2 functions.py, and optionally any other code
you used in the assignment.
References
[1] David I Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, and Pierre Vandergheynst. The
emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and
other irregular domains. IEEE Signal Processing Magazine, 30(3):83–98, 2013.
[2] Nathanael Perraudin, Benjamin Ricaud, David Shuman, and Pierre Vandergheynst. Global and local
uncertainty principles for signals on graphs. arXiv preprint arXiv:1603.03030, 2016.
[3] David I Shuman, Benjamin Ricaud, and Pierre Vandergheynst. Vertex-frequency analysis on graphs.
Applied and Computational Harmonic Analysis, 40(2):260–291, 2016.
[4] Andrew Y Ng, Michael I Jordan, and Yair Weiss. On spectral clustering: Analysis and an algorithm. In
Advances in neural information processing systems, pages 849–856, 2002.
[5] Ulrike Von Luxburg. A tutorial on spectral clustering. Statistics and computing, 17(4):395–416, 2007.
[6] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of
communities in large networks. Journal of statistical mechanics: theory and experiment, 2008(10):P10008,
2008.
[7] Laurens van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning
research, 9(Nov):2579–2605, 2008.
[8] Kevin R Moon, David van Dijk, Zheng Wang, William Chen, Matthew J Hirn, Ronald R Coifman,
Natalia B Ivanova, Guy Wolf, and Smita Krishnaswamy. Phate: a dimensionality reduction method for
visualizing trajectory structures in high-dimensional biological data. BioRxiv, page 120378, 2017.
[9] Karthik Shekhar, Sylvain W Lapan, Irene E Whitney, Nicholas M Tran, Evan Z Macosko, Monika
Kowalczyk, Xian Adiconis, Joshua Z Levin, James Nemesh, Melissa Goldman, et al. Comprehensive
classification of retinal bipolar neurons by single-cell transcriptomics. Cell, 166(5):1308–1323, 2016.
12