$30
CptS 591: Elements of Network Science
Assignment 2: Centrality and Ranking.
There are three problems in this assignment.
For Problem 1, you will work with four real-world networks. Three of these are networks
selected from Mark Newman’s collection of Network Data (http://www-personal.umich.
edu/~mejn/netdata/). The networks in the collection are given certain descriptive names.
The three networks chosen for this assignment are called: (1) Political blogs, (2) Neural
network, and (3) Internet. Read the brief descriptions there so you get an idea for what the
networks are modeling. The data sets are in GML format, which is a format the package
igraph understands.
You are free to choose the fourth real-world network for Problem 1, but it is recommended
to be a social or information network. Examples include a social network in a certain
community/context (school, workplace, prison, etc ), a Facebook network, a twitter network,
a who-calls-whom network, who-trusts-whom network, a citation network, etc.
For Problems 1 and 2, you are expected to use the software tool igraph. For some of the
calculations and plots you may need to use Numpy (which is a package for scientific computing
with Python), MatLab or a similar environment. Problem 3 does not require any software
tool. It is an essay-type question based on reading a paper.
Your submission: will be one PDF file consisting of your solutions and an appendix
describing your procedures (e.g. listing of Python/R/Matlab codes).
Problem 1 (45%). First, briefly describe the fourth real-world network you chose to
work with for this problem and provide a URL to where it is located. For each of the four
real-world networks you study, identify the node(s) in the network with the highest scores in
terms of the following centrality measures: (i) Degree, (ii) Eccentricity, (iii) Closeness, (iv)
Betweenness, (v) Katz index, (vi) PageRank, (vii) Kleinberg’s Authority score, and (viii)
Kleinberg’s Hub score.
Try to find out in what way the nodes identified by these measures as the most important
might be important in the network. Are there cases in which the different centrality measures
identify the same node(s) as the most important? Discuss your results.
Problem 2 (45%). Generate two types of random graphs using igraph’s methods for the
Erdos-Renyi and Barabasi-Albert (preferential attachment) models. (The Barabasi-Albert
model is a simple stochastic algorithm that generates a scale-free graph, a graph with a
Power-Law degree distribution.) Let the number of nodes in each case be 20. Choose the
parameters in the two models such that the two graphs you generate have roughly the same
number of edges. In the same fashion, generate another set of two graphs, this time with
the number of nodes in each case being 40, instead of 20.
1. For each of the four graphs you generated, compute all eigenvalues and corresponding
eigenvectors of the Laplacian of the graph (e.g. using MatLab’s (or Numpy’s) eig
1
function). Do not include the results of this computation in your submission, but you
will need the result for the subsequent subproblems.
2. Complete the following table for each graph. (In case the graph you generated consists
of more than one component, in populating the table, consider the largest connected
component.)
Graph n m dmin dmax l D ccg λ2 λn
where n is the number of nodes; m is the number of edges; dmin is the minimum degree
in the graph; dmax is the maximum degree in the graph; l is the average path length;
D is the diameter; ccg is the global clustering coefficient; λ2 is the second smallest
eigenvalue (the algebraic connectivity); and λn is the largest eigenvalue. Discuss the
observations you make out of these data.
3. For this subproblem you will focus on the 20-nodes graphs in each of the two models.
You will generate two figures, one for each model. In each figure, plot two quantities
together: the eigenvector corresponding to the second-smallest eigenvalue (λ2) and the
eigenvector corresponding to the largest eigenvalue (λn). In the plots, the x-axis would
show vertex ids and the y-axis shows value of the eigenvector. Compare and contrast
the plots of the two graphs, and discuss any observations you make.
Problem 3 (10%). Read the first four sections of the SIAM Review article “PageRank
Beyond the Web” by David Gleich (note a copy of the article is posted under the “Papers”
page on the course’s website). From the very wide range of applications of PageRank discussed in section 4 of the article, pick three that you personally found most fascinating (or
interesting) and tell me why. For each application you picked, a couple of sentences stating
your reasons is adequate, but you are welcome to write a few more sentences, if you wish.
2