$30
Assignment 7 - Graph drawing and network analysis
An important step to gain insight from data is to recognize existing, known structure and incorporate it into your analysis. Graph structures as a mathematical representation of connectedness can be found in a wide variety of practical settings.
In this assignment we hope to enable you to:
Understand what graphs are and how to recognize graph-type relationships in your data
Use standard graph analysis techniques and tools to get answers for data analysis questions
Work with large graphs on a desktop machine using the graph-tool python library
Background on graphs and a tool tutorial
If you require an introduction to basic graph notions, see Maël Fabien 's blog post sections I and II. While graph theory and algorithms can involve deep mathematical connections, the basic definitions and much practical use of graph data is not difficult to get started with.
For further background on graph-tool refer to the graph tutorial slides and the tutorial notebook. See the lab setup on how you can use graph tool on the lab machines.
Running Evironment
This assignment use Google Colab as the running environment. The first two blocks are used for graph_tools import and google drive mount. For future reference, these two blocks are intentionally kept here.
!echo "deb http://downloads.skewed.de/apt focal main" >> /etc/apt/sources.list
!apt-key adv --keyserver keyserver.ubuntu.com --recv-key 612DEFB798507F25
!apt-get update
!apt-get install python3-graph-tool python3-matplotlib python3-cairo
Executing: /tmp/apt-key-gpghome.2Hgbsin4NZ/gpg.1.sh --keyserver keyserver.ubuntu.com --recv-key 612DEFB798507F25
gpg: key 612DEFB798507F25: "Tiago de Paula Peixoto <tiago@skewed.de>" not changed
gpg: Total number processed: 1
gpg: unchanged: 1
Get:1 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
Hit:2 https://cloud.r-project.org/bin/linux/ubuntu focal-cran40/ InRelease
Ign:3 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64 InRelease
Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64 InRelease
Hit:5 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64 Release
Hit:6 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:7 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Get:8 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
Hit:9 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu focal InRelease
Hit:10 http://downloads.skewed.de/apt focal InRelease
Hit:12 http://ppa.launchpad.net/cran/libgit2/ubuntu focal InRelease
Hit:13 http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal InRelease
Hit:14 http://ppa.launchpad.net/graphics-drivers/ppa/ubuntu focal InRelease
Hit:15 http://ppa.launchpad.net/ubuntugis/ppa/ubuntu focal InRelease
Fetched 222 kB in 2s (125 kB/s)
Reading package lists... Done
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:55
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:55
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:56
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:56
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:57
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:57
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:58
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:58
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:59
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:59
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:55
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:55
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:56
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:56
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:57
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:57
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:58
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:58
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:59
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:59
Reading package lists... Done
Building dependency tree
Reading state information... Done
python3-cairo is already the newest version (1.16.2-2ubuntu2).
python3-matplotlib is already the newest version (3.1.2-1ubuntu4).
python3-graph-tool is already the newest version (2.46).
0 upgraded, 0 newly installed, 0 to remove and 22 not upgraded.
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:55
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:55
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:56
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:56
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:57
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:57
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:58
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:58
W: Target Packages (main/binary-amd64/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:59
W: Target Packages (main/binary-all/Packages) is configured multiple times in /etc/apt/sources.list:54 and /etc/apt/sources.list:59
# mount driver
from google.colab import drive
drive.mount('/content/drive')
import os
path="/content/drive/MyDrive/cmpt733/assignment7"
os.chdir(path)
os.listdir(path)
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
['facebook_combined.txt', 'gt_utils.py', '__pycache__', 'A7.ipynb']
# Use this code for basic setup
import matplotlib as mpl
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from IPython.display import display, Markdown
%matplotlib inline
import graph_tool.all as gt
print("graph-tool version: {}".format(gt.__version__.split(' ')[0]))
import gt_utils as au
graph-tool version: 2.46
Problem 1: Power grid analysis
g = gt.collection.data['power']
display(Markdown(gt.collection.descriptions['power']))
Power grid: An undirected, unweighted network representing the topology of the Western States Power Grid of the United States. Data compiled by D. Watts and S. Strogatz and made available on the web here <http://cdg.columbia.edu/cdg/datasets>. Please cite D. J. Watts and S. H. Strogatz, Nature 393, 440-442 (1998). Retrieved from Mark Newman's website <http://www-personal.umich.edu/~mejn/netdata/>.
In this graph, an edge represents a power supply line. A node is either a generator, a transformator, or a substation.
Task 1a: Create a drawing of this graph that emphasizes nodes that have more than 10 incident (or incoming) power supply lines. Set the size of all other nodes to 0, but retain visibility of the power lines.
# TODO
vprop_size = g.new_vertex_property("int")
#notice that for NDAG the incoming degree is all zero
for index, v in enumerate(g.vertices()):
if(v.out_degree()>10):
vprop_size[v] = v.out_degree()
else:
vprop_size[v] = 0
gt.graph_draw(g, vertex_size = vprop_size, edge_pen_width = 1)
<VertexPropertyMap object with value type 'vector<double>', for Graph 0x7f556434d280, at 0x7f5519b1e340>
Task 1b: Identify one of the centrality measures that can be used to indicate powerlines that act as a bridge between different parts of the network. Use this to emphasize structurally important nodes and powerlines.
# TODO
vb, eb = gt.betweenness(g)
gt.graph_draw(g,
vertex_fill_color = vb,
vertex_size = gt.prop_to_size(vb, mi = 0.5,ma = 15),
edge_pen_width = gt.prop_to_size(eb, mi = 1, ma = 5),
vcmap=mpl.cm.gist_heat
)
<VertexPropertyMap object with value type 'vector<double>', for Graph 0x7f556434d280, at 0x7f5519c6d370>
Problem 2: Small social graph visualization
X_knows = {
'Mary': ['Peter', 'Albert', 'DavidF', 'Peter'],
'Judy': ['Bob', 'Alan'],
'Peter': ['Mary', 'DavidF', 'Jon'],
'DavidF': ['Albert', 'Joseph', 'Peter', 'Mary'],
'Jon': ['Peter', 'Joseph', 'DavidE'],
'DavidE': ['Jon', 'Joseph', 'Albert'],
'Joseph': ['DavidE', 'Jon', 'DavidF'],
'Bob': ['Judy', 'Alan'],
'Alan': ['Bob', 'Mary', 'Judy'],
'Albert': ['DavidF', 'Mary', 'DavidE'],
}
Task: Create an undirected graph that represents the personal network above, remove parallel edges, and draw using a layout that resembles the tidy example as shown further below.
from re import X
g = gt.Graph(directed=False)
vprop_size = g.new_vertex_property("int")
#remove duplicated edges
new_knows = {}
for i in X_knows:
new_knows[i] = []
for i in X_knows:
for j in X_knows[i]:
if(i in new_knows[j] or j in new_knows[i]):
continue
new_knows[i].append(j)
X_knows = new_knows
#add list to graph
v_name = g.add_edge_list(((n,k) for n in X_knows for k in X_knows[n]),
hashed=True)
#set vprops
for index, v in enumerate(g.vertices()):
vprop_size[v] = v.out_degree()
#ser graph layout
pos = gt.arf_layout(g)
gt.graph_draw(
g,
vertex_text = v_name,
vertex_font_size=10,
vertext_size = 100,
output_size=(600,600), # adjust to fit screen
fit_view=.9,
fmt="svg"
)
/usr/lib/python3/dist-packages/graph_tool/draw/cairo_draw.py:634: UserWarning: Unknown parameter: vertext_size
warnings.warn("Unknown parameter: " + k, UserWarning)
<VertexPropertyMap object with value type 'vector<double>', for Graph 0x7f55117b4280, at 0x7f5511749bb0>
After adjusting the code above you should be able to obtain a 'tidy' result like this.
Problem 3: Facebook graph analysis
For this question we will work with a real social graph of facebook friendship connections. Please download facebook_combined.txt from SNAP, the Stanford Large Network Dataset Collection and create a Graph object with graph-tool. The dataset contains the ego networks of 10 facebook users, i.e. it contains the ego nodes, the friends of each of these facebook users, and the connections among those friends.
Goal below is to determine influencers among the users, based on a measure of centrality and not including the ego nodes themselves.
Task 3a: Load the dataset and create a drawing of the graph.
# TODO
g = gt.load_graph_from_csv('facebook_combined.txt', directed=False, csv_options={'delimiter': ' '})
# TODO
gt.graph_draw(g)