There are 75 regular and 20 extra credit points available on this problem set.
1. (45 pts) Recall that the string alignment problem takes as input two strings x and y,
composed of symbols xi; yj 2 Σ, for a fixed symbol set Σ, and returns a minimal-cost
set of edit operations for transforming the string x into string y.
Let x contain nx symbols, let y contain ny symbols, and let the set of edit operations be
those defined in the lecture notes (substitution, insertion, deletion, and transposition).
Let the cost of indel be 1, the cost of swap be 10 (plus the cost of the two sub ops), and
the cost of sub be 10, except when xi = yj, which is a \no-op" and has cost 0.
(a) From scratch, implement the functions alignStrings, extractAlignment, and
commonSubstrings. You may not use any library functions that make their implementation trivial. Within your implementation of extractAlignment, ties must
be broken uniformly at random.
Submit (i) a paragraph for each function that explains how you implemented it
(describe how it works and how it uses its data structures), and (ii) your code
implementation, with code comments
(b) Using asymptotic analysis, determine the running time of the call
commonSubstrings(x, L, extractAlignment( alignStrings(x,y), x,y ) )
Justify your answer
(c) (15 pts extra credit) Describe an algorithm for counting the number of optimal
alignments, given an optimal cost matrix S. Prove that your algorithm is correct,
and give is asymptotic running time.
(d) String alignment algorithms can be used to detect changes between different versions of the same document (as in version control systems) or to detect verbatim
copying between different documents (as in plagiarism detection systems).
The two data string files for PS6 (see class Moodle) contain actual documents
recently released by two independent organizations. Use your functions from (??)
to align the text of these two documents. Present the results of your analysis, including a reporting of all the substrings in x of length L = 10 or more that could
have been taken from y, and briefly comment on whether these documents could
be reasonably considered original works, under CU’s academic honesty policy.
2. (10 pts) Ginerva Weasley is playing with the network given below. Help her calculate
the number of paths from node 1 to node 14.
3. (20 pts) Ron and Hermione are having a competition to see who can compute the nth
Lucas number Ln more quickly, without resorting to magic. Recall that the nth Lucas
number is defined as Ln = Ln−1 + Ln−2 for n 1 with base cases L0 = 2 and L1 = 1.
Ron opens with the classic recursive algorithm:
Luc(n) :
if n == 0 { return 2 }
else if n == 1 { return 1 }
else { return Luc(n-1) + Luc(n-2) }
which he claims takes R(n) = R(n − 1) + R(n − 2) + c = O(φn) time.
(a) Hermione counters with a dynamic programming approach that \memoizes" (a.k.a.
memorizes) the intermediate Lucas numbers by storing them in an array L[n].
She claims this allows an algorithm to compute larger Lucas numbers more quickly,
and writes down the following algorithm.
MemLuc(n) {
if n == 0 { return 2 } else if n == 1 { return 1 }
else {
if (L[n] == undefined) { L[n] = MemLuc(n-1) + MemLuc(n-2) }
return L[n]
}
}
i. Describe the behavior of MemLuc(n) in terms of a traversal of a computation
tree. Describe how the array L is filled
ii. Determine the asymptotic running time of MemLuc. Prove your claim is correct by induction on the contents of the array
(b) Ron then claims that he can beat Hermione’s dynamic programming algorithm in
both time and space with another dynamic programming algorithm, which eliminates the recursion completely and instead builds up directly to the final solution
by filling the L array in order. Ron’s new algorithm is
DynLuc(n) :
L[0] = 2, L[1] = 1
for i = 2 to n { L[i] = L[i-1] + L[i-2] }
return L[n]
Determine the time and space usage of DynLuc(n). Justify your answers and
compare them to the answers in part (??)
(c) With a gleam in her eye, Hermione tells Ron that she can do everything he can
do better: she can compute the nth Lucas number even faster because intermediate
results do not need to be stored. Over Ron’s pathetic cries, Hermione says
FasterLuc(n) :
a = 2, b = 1
for i = 2 to n
c = a + b
a = b
b = c
end
return a
Ron giggles and says that Hermione has a bug in her algorithm. Determine
the error, give its correction, and then determine the time and space usage of
FasterLuc(n). Justify your claims
(d) In a table, list each of the four algorithms as rows and for each give its asymptotic
time and space requirements, along with the implied or explicit data structures that
each requires. Briefly discuss how these different approaches compare, and where
the improvements come from
(e) (5 pts extra credit) Implement FasterLuc and then compute Ln where n is the
four-digit number representing your MMDD birthday, and report the first five digits of Ln. Now, assuming that it takes one nanosecond per operation, estimate the
number of years required to compute Ln using Ron’s classic recursive algorithm
and compare that to the clock time required to compute Ln using FasterLuc.