$30
Assignment 3:
Instructions
Complete the following problem set showing your work. Problems may be worked out "by hand" or in "python" or with the
assistance of other analytical software (e.g., Mathematica, MatLab). You may use chatGPT to assist in coding.
Solutions must be type written (e.g., in Jupyter, markdown, or latex). Upload PDF solution by question to crowdmark (link will be
emailed to you) by 11:59pm on the Sunday of the corresponding week (see syllabus). If you have issues with Crowdmark
submission please email solutions to Rebekah Hall (rah11@sfu.ca).
All problems are equally weighted within an assignment. Students in 468 may or may not choose to attempt the challenge
question for a bonus pts. Students in 795 are required to complete the challenge question.
Problem Set
1. Museum Collections
Museum collections are extraordinarily valuable in the study of ecology and evolution as they give us access to rare long-term
(longitudinal) data that couldn't be collected in the 4 years of a typical PhD project. Consider the number of specimens of
European Goldenrod Solidago virgaurea present in a herbarium (herbarium: museum for plants). These collections are done by
many different researchers, often with years in between. However, given that a researcher is collecting data on goldenrods they
are likely to submit more than one accession (accession: submission to a biological database or museum) at the same time. Hence
we can model the accumulation of accessions of S. virgaurea using a compound Poisson process.
Suppose that research studies on goldenrod occur at a constant rate and that the number of accessions
submitted per study is distributed according to a negative binomial distribution with parameters , .
Part A: How many independent research studies are expected to occur during a PhD (e.g., 4 years)? What is the expected time
between research studies? How many accessions are submitted by the average research study? Plot the distribution of
accessions/study. How many accessions would constitute a 'large' study?
Part B: What is the expected number of accessions over a 10-year period by all researchers?
Part C: What is the variance in the number of accessions submitted over this 10 year period?
2. Stochastic SI Epidemic Dynamics
Consider a simplified epidemiological model for the spread of an infectious disease in a small population. The model includes two
compartments: susceptible individuals (S) and infected individuals (I). The disease spreads through a single type of interaction with
a given infection rate.
Susceptible individuals become infected at a per-capita rate
Infected individuals recover (immediately become susceptible) at a per-capita rate
Part A: Write a system of ODEs describing the dynamics in this system. Solve them numerically for and
and ?
Part B: For this model the value is the number of secondary infections that result from a single initial infection in an
otherwise susceptible population. If then the disease is guaranteed to go extinct if the disease will spread in the
deterministic model. What are four parameter combinations that have an value of and respectively? Plot the
dynamics for in the deterministic model for each of these parameter sets.
Part C: Write a Gillespie algorithm to simulate the dynamics of a corresponding stochastic epidemic where . Describe
what the possible events are, their rates, and what is their effect on state space?
λ = 0.61 1/year
r = 8 p = 0.75
βI
γI
S(0) = 99, I(0) = 1
β = 0.001 γ = 0.05
R0 = γ
Nβ
R0 < 1 R0 > 1
R0 0.5, 0.9, 1.1 1.5
I(t)
R0 = 1.5
Part D: Simulate 50 trajectories for each of the four parameter sets you chose in B. How do the stochastic dynamics compare to
the deterministic dynamics? Are you surprised by any of the results given ?
3. Skip This Problem Bog Bodies and Ancestral DNA
The chemical conditions of peat bogs are ideal for the natural preservation of human bodies making them a rich source of
ancient cadavers known as 'bog bodies' these bodies can range in age but are often from the Iron Age (1300 B.C.E. to 800
C.E.). Over this time period, the effective human population size is approximately 5000.
Part A: Consider a bog body that is 3000 years old. Draw the topology of a gene genealogy between yourself and this bog body.
Part B: Assuming that human generation times are 20 years. How long ago did you and the bog body share a common ancestor?
Give the full distribution of times to common ancestry, the expected time, and the variance in times and make sure to note the
units of time that each of these answers is measured in. What is the expected time to your common ancestor in units of years,
generations, and coalescent time units?
Part C: Assuming an infinite sites model, what is the expected number of pairwise differences between your genome and that of
the bog body? Assume a mutation rate of . How many segregating sites are there in this sample of two genomes (you
and the bog body)?
Part D (Challenge Part for 795): Now consider a sample with three genomes, your genome, the 3000-year-old bog body, and a
second 2000-year-old bog body. Draw the two possible genealogical topologies between the three samples. Do these two
topologies occur with equal probability, if not what is the probability each occurs?
[Hint: What is the probability that you and the 2000-year-old body coalesce before 3000 years ago?]
Part E (Challenge Part for 795): What are the expected times to the common ancestor of a) You and the 2000-year-old body, b)
You and the 3000-year-old body, and c) the 2000-year-old body and the 3000-year-old body
4. Genetic diversity
Consider the following sample of 5 genome sequences. There are several different formats in which DNA sequences can be
reported. The following is in the style of a VCF "Variant Call Format" file that reports only sites in which two or more neucleotides
are present.
Part A: Calculate the number of segregating sites, , in the sample.
Part B: Calculate the number of pairwise differences between each,
between each pair of sequences. What is the average number of pairwise differences in this sample?
Part C: Calculate the observed site frequency spectrum, .
Part D (Challenge Part for 795): Assuming genetic diversity evolves according to the infinite sites model, propose a hypothetical
genealogy of this sample. What is the likely topology of this genealogy and what are the likely coalescent times?
5. Time to the most recent common ancestor
Consider a population from which you have sampled 4 haploid individuals.
Part A: Draw and label a coalescent history in which the coalescent times shown are proportional to the expected values.
Part B: Draw a error bar at each internal node indicating the variance in the TOTAL time to that coalescent event.
R0
θ = 0.8
seq:
1 :
2 :
3 :
4 :
5 :
1
A
A
T
A
A
2
T
A
T
T
T
3
T
T
A
A
A
4
A
A
A
G
G
5
A
A
A
G
G
6
C
C
C
T
T
7
G
G
G
C
C
8
G
G
G
G
A
9
A
A
A
C
C
10
G
G
G
C
C
11
G
G
G
C
G
12
G
G
A
G
G
13
C
A
A
A
A
14
G
G
C
G
G
15
C
C
C
G
G
16
G
G
G
A
A
17
T
T
T
A
T
18
G
G
G
G
A
19
T
A
T
T
T
20
A
A
G
A
A
21
T
T
A
A
A
22
T
T
C
T
T
S
ξ
±1SD
P
a
r
t
C
:
W
h
a
t is
t
h
e
dis
t
rib
u
tio
n
o
f
tim
e
s
u
n
til t
h
e
r
e
a
r
e
e
x
a
c
tly
2 lin
e
a
g
e
s in
t
h
e
p
o
p
ula
tio
n
,
?
Plo
t
t
his
dis
t
rib
u
tio
n
t
o
d
o
u
ble
c
h
e
c
k
y
o
u
r
a
n
s
w
e
r
s
a
b
o
v
e. P
r
(
T4
+
T3
)