Starting from:

$30

Computer Science CSC263H Homework Assignment #3

Computer Science CSC263H
Homework Assignment #3

• You must submit your assignment as a PDF file, named a3.pdf, of a typed (not handwritten)
document through the MarkUs system by logging in with your CDF account at:
https://markus.teach.cs.toronto.edu/csc263-2019-01
To work with one or two partners, you and your partner(s) must form a group on MarkUs.
• The a3.pdf PDF file that you submit must be clearly legible. To this end, we encourage you to
learn and use the LATEX typesetting system, which is designed to produce high-quality documents
that contain mathematical notation. You can use other typesetting systems if you prefer, but
handwritten documents are not accepted.
• If this assignment is submitted by a group of two or three students, the a3.pdf PDF file that you
submit should contain for each assignment question:
1. The name(s) of the student(s) who wrote the solution to this question, and
2. The name(s) of the student(s) who read this solution to verify its clarity and correctness.
• By virtue of submitting this assignment you (and your partners, if you have any) acknowledge that
you are aware of the homework collaboration policy that is stated in the csc263 course web page:
http://www.cs.toronto.edu/~sam/teaching/263/#HomeworkCollaboration.
• For any question, you may use data structures and algorithms previously described in class, or in
prerequisites of this course, without describing them. You may also use any result that we covered
in class, or is in the assigned sections of the official course textbook, by referring to it.
• Unless we explicitly state otherwise, you should justify your answers. Your paper will be marked
based on the correctness and completeness of your answers, and the clarity, precision, and conciseness of your presentation.
• Your a3.pdf submission should be no more than 6 pages long in a 10pt font.
1
Question 1. (1 marks) In this question, you must use the insertion and deletion algorithms as described
in the “Balanced Search Trees: AVL trees” handout posted on the course web site.
a. Insert keys 0, 25, 19, 5, -2, 28, 13, -5, 2, 6, 14, 7 (in this order) into an initially empty AVL tree,
and show the resulting AVL tree T, including the balance factor of each node (only the final tree should
be shown).
b. Delete key 2 from the above AVL tree T, and show the resulting AVL tree, including the balance
factor of each node.
In each of the above questions, only the final tree should be shown: intermediate trees will be disregarded,
and not given partial credit.
Question 2. (1 marks) You are tasked with designing a data structure that maintains a set of books for
an online bookstore. A book is defined as a 3-tuple (identif ier, price, rating), where
• identif ier is a unique positive integer which identifies the book.
• price is a positive real number which represents the price of the book in dollars.
• rating is a real number in the range [0, 5], which represents how good the book is.
Note that you can not assume that prices and ratings are unique: there may be several books which have
the same price, and there may be several books which have the same rating.
a. Name a standard data structure D that can store the set of books such that each of the following
operations can be done in O(log n) time in the worst-case (where n is the number of books in D):
AddBook(D, x): Insert x which is a new book of the form (identif ier, price, rating) into D.
SearchBook(D, id): Return the price and rating of the book in D whose identif ier is id. If there is
no book in D whose identif ier is id, return nil.
Briefly describe what each node u of your data structure contains, and what is the key that you are using
when adding a new book. Which standard operations of your data structure you are using to implement
AddBook and SearchBook?
In the Parts b, c, d, e below:
• For each tree that you define, describe what each node of your tree contains, and what is the key
that you are using when inserting a new item into this tree.
• For each operation, you should give a clear and concise English description of the algorithm
implementing the operation.
• Also give brief explanations of why each of your algorithms achieve the worst-case time complexity
specified in the subquestion (where n is the number of books in the data structure).
b. Modify D to support the following operation, in addition to the operations of Parts a:
BestBookRating(D, p): Let r be the maximum rating among all books in D whose price is at most p.
Then return r. If no book has price at most p, then return −1.
In addition to the English description of the algorithm implementing the above operation, you must
also give the corresponding pseudocode. Also describe in English any changes you make to the
implementation of the previously defined operations. The worst-case time complexity of each operation
should be O(log n).
Hint: Use an augmented AVL tree, in addition to the data structure used to implement Part a.
2
c. Modify D to support the following operation, in addition to the operations of Parts a, b:
AllBestBooks(D, p): Let r be the maximum rating among all books in D whose price is at most p.
Then return a pointer to a list of all books in D whose rating is r. If no book has price at most p,
then return nil.
Also describe in English any changes you make to the implementation of the previously defined operations
(don’t use pseudo-code). The worst-case time complexity of each operation should be O(log n).
Hint: Use another AVL tree, in addition to the data structures used to implement Parts a, b.
d. Modify D to support the following operation, in addition to the operations of Parts a, b, c:
IncreasePrice(D, p): Increase the price of every book in D by p dollars. Assume that p is a positive
real number.
Also describe in English any changes you make to the implementation of the previously defined operations
(don’t use pseudo-code). The worst-case time complexity of the IncreasePrice operation should be O(1).
The worst-case time complexity of the previously defined operations should remain O(log n).
e. Modify D to support the following operation, in addition to the operations of Parts a, b, c, d:
DeleteBook(D, id): Remove from D the book whose identif ier is id.
Also describe in English any changes you make to the implementation of the previously defined operations (don’t use pseudo-code). The worst-case time complexity of the DeleteBook operation should be
O(log n), where n is the number of books in D. The worst-case time complexity of the previously defined
operations should remain unchanged.
Question 3. (1 marks)
Give an efficient algorithm which, given two sets A and B, outputs every element of A that is not
in B. More precisely, the algorithm’s input are two sets A and B given as unsorted arrays A[1..n]
and B[1..n], each containing n distinct integers. The algorithm must print the elements of the set
A − B = {e | e ∈ A and e /∈ B}.
The expected time complexity of your algorithm should be O(n), under some assumptions that we discussed
in class.
Hint: What data structures or algorithms that we learned in class involve reasoning about probability?
Note: Do not assume that the numbers in the array are small, or they have a small range, e.g., they
could be arbitrary integers. So a solution based on a direct access table or on linear time sorting is not
acceptable.
a. Describe your algorithm clearly and concisely in English, and then give the pseudo-code.
b. Explain why the expected running time of your algorithm is O(n). Explicitly state every assumption
that you need for your complexity analysis.
c. What is the worst-case running time of your algorithm? Use the Θ notation. Justify your answer.
3
[The questions below will not be corrected/graded. They are given here as interesting problems
that use material that you learned in class.]
Question 4. (0 marks) Consider an abstract data type that consists of a set S of integers on which the
following operations can be performed:
Add(i): Adds the integer i to S. If this integer already is in S, then S does not change
Average(t): Returns the average of all elements of S that are less than or equal to the integer t. If all
the elements of S are greater than t, then return 0.
Describe how to implement this abstract data type using an augmented AVL tree T. Each operation should
run in O(log n) worst-case time, where n = |S|. Since this implementation is based on a data structure
and algorithms described in class and in the AVL handout, you should focus on describing the extensions
and modifications needed here.
a. Give a precise and full description of your data structure. In particular, specify what data is associated
with each node, specify what the key is at each node, and specify what the auxiliary information is at
each node. In particular, what is (are) the augmented field(s), and what identity should this (these) fields
satisfy. Illustrate this data structure by giving an example of it (with a small set of your own choice).
b. Describe the algorithm that implements each one of the two operations above, and explain why each
one takes O(log n) time in the worst-case. Your description and explanation should be in clear and concise
English. For the operation Average, you should also give the algorithm’s high-level pseudocode.
Question 5. (0 marks)
The task in this question is to compute the medians of all prefixes of an array. As input we are given the
array A[1 . . n] of distinct integers. Using a heap data structure, design an algorithm that outputs another
array M[1 . . n], so that M[i] is equal to the median of the numbers in the subarray A[1 . . i]. Recall that
when i is odd, the median of A[1 . . i] is the element of rank (i+1)/2 in the subarray, and when i is even, the
median is the average of the elements with ranks i/2 and i/2 + 1. Your algorithm should run in worst-case
time O(n log n).
a. Describe your algorithm in clear and concise English, and also provide the corresponding pseudocode.
Argue that your algorithm is correct.
b. Justify why your algorithm runs in time O(n log n).
Question 6. (0 marks)
Dynamic storage allocation is a resource management task that is required at several levels of computer
systems: for example, an operating system must allocate chunks of main memory to processes, and a file
system must allocate chunks of disk space to files. We can describe the problem of storage management
in general terms as follows. There is a store S of N equal-size cells, indexed as S[i] for i ∈ {1, 2, . . . , N}.
In practice, the cells of the store are individually addressable units of some type of memory. For example,
they may be bytes, words or pages of main memory; or they may be blocks or sectors of disk space.
The storage manager maintains at all times a pool of available fragments. Intuitively, a fragment
is a maximal sequence of consecutive cells that are available to be allocated to a process that needs them.
When a process requests a fragment of size `, the storage manager determines if the pool of available
fragments contains a fragment of size k ≥ `. If it does not, the request is rejected. If the pool contains a
fragment of size k ≥ `, the storage manager removes it from the pool, splits it into one fragment of size
` and one fragment of size k − `, allocates the former to the requesting process, and returns the latter (if
k − ` 6= 0) to the pool. When a process is done with a fragment that it was previously allocated, it returns
4
that fragment to the storage manager. The storage manager returns the fragment to the storage pool after
coalescing it with its adjacent fragments, if such fragments are presently in the pool.
More precisely, a fragment is a pair (a, k), where a, k are positive integers such that a + k − 1 ≤ N; a
is the fragment’s start address (the index of the first cell in the fragment) and k is its size (the number
of cells in the fragment). Thus, fragment (a, k) corresponds to the subarray S[a..a + k − 1] of the store S.
Two fragments F = (a, k) and F
0 = (a
0
, k0
) are adjacent if the start address of one is immediately after
the last cell of the other (i.e., a = a
0 + k
0 or a
0 = a + k); F and F
0 are overlapping if some cell belongs
to both of them (i.e., the intervals [a, a + k − 1] and [a
0
, a0 + k
0 − 1] are not disjoint). The pool of available
fragments is a set of fragments that does not contain any adjacent or overlapping fragments. Initially, the
pool of available fragments consists of a single fragment, namely (1, N).
We can formulate storage management as an abstract data type. The object manipulated by this ADT
is the pool of available fragments, and there are two operations:
• Allocate(P, `): Intuitively, this is a request to allocate a fragment of size `.
P is a pool of available fragments, and ` is an integer such that 1 ≤ ` ≤ N. If P contains no fragment
whose size is at least `, the operation returns “reject”. If P contains some fragment, say (a, k), such
that k ≥ `, the operation returns (a, `), and adjusts P by replacing (a, k) with (a + `, k − `), if k `
— or replacing it with nothing, if k = `.
• Release(P,(a, `)): Intuitively, this returns a previously allocated fragment to the pool.
P is a pool of available fragments, and (a, `) is a fragment that does not overlap any fragment in P.
The operation adjusts P by returning the fragment (a, `) to it, after coalescing it with any adjacent
fragments.
When the operation Allocate(P, `) is invoked and the pool P contains several fragments of size ` or
more, there are various strategies that the storage manager could use to decide which fragment to use.
Following are three common strategies:
• First-fit: choose the fragment of adequate size that has the smallest possible start address.
• Best-fit: choose a smallest fragment of adequate size.
• Worst-fit: choose a largest fragment of adequate size.
In this question, you must describe how to implement the storage management ADT described above
using an augmented AVL tree (which represents the pool P of currently available fragments). Your implementation should have the property that the Allocate operation uses the first-fit strategy. Furthermore, each operation Allocate and Release should take O(log n) time in the worst-case, where n is the
number of fragments currently in the pool P.
Since this implementation of P is based on a data structure and algorithms described in class and in
the AVL handout, you should focus on describing the extensions and modifications needed here.
1. Give a precise and full description of your data structure. In particular, specify what data is associated
with each node, specify what the key is at each node, and specify what the auxiliary information is
at each node. Illustrate this data structure by giving an example of it (on a small pool P of available
fragments of your own choice).
2. Describe the algorithms for Allocate and Release, and explain why each one takes O(log n) time
in the worst-case.
5
Question 7. (0 marks) You are given a list L of n, not necessarily distinct, integers. Your task is to devise
an algorithm that outputs a list of the distinct integers in L, in order of non-increasing frequency (the
frequency of an element in L is the number of times it occurs in L). For example, if L is 2, 5, 4, 4, 2, 3, 4, 3,
then a suitable output is 4, 3, 2, 5; the only other suitable output for this list is 4, 2, 3, 5. In this example,
L contains only small integers, but this is not always the case. You cannot make any assumption on the
integers in L (e.g., their maximum size, or distribution). In particular, some integers of L can very large,
and so it is not feasible to use tables of size proportional to these integers.
a. Give a Hash Table-based algorithm for this problem whose expected time complexity is O(n), under
some Hash Table assumptions that you must clearly state. Explain why, your algorithm achieves this time
complexity under these assumptions.
What is the worst-case time complexity of your algorithm? Use the Θ notation and justify your answer.
Hint: As part of your algorithm, you may find useful to discover a way to sort n numbers in the range
0 to n in O(n) worst-case time.
b. Give an algorithm for this problem whose worst-case time complexity is O(n log n). Explain why
your algorithm achieves this time complexity.
Question 8. (0 marks)
In this question you will explore different algorithms to perform the join of two relations, a very important
operation in relational databases. We consider a simplified setting where we wish to compute the join of
relation R with attributes A and C, and relation S with attributes B and C. Notice that the two relations
have a common attribute, C. Each relation is a set of pairs. For each pair (a, c) of R, a is a value of
attribute A and c is a value of attribute C. Similarly, for each pair (b, c) of S, b is a value of attribute
B and c is a value of attribute C. It may be useful to think of R as a two-dimensional array, with two
columns, the first corresponding to A and the other corresponding to C: each row of the two-dimensional
array corresponds to a pair (a, c) of R; so the number of rows of the array is equal to the number of pairs
in R. Similar comments apply to S.
The join of R and S, denoted R ./ S, is a relation with attributes A, B and C (i.e., the union of the
attributes of R and S). It consists of the set of all triples (a, b, c) such that (a, c) is a pair in R and (b, c)
is a pair in S. Notice that the two pairs that are being joined to produce the triple (a, b, c) of R ./ S must
have the same value in the common attribute C.
Another way of thinking about R ./ S is as follows: We consider every possible combination of a pair
(a, c) from R and a pair (b, c0
) from S. If c = c
0 — i.e., if the two pairs have the same value in the common
attribute — then this combination contributes the triple (a, b, c) to the join. If c 6= c
0
then this combination
contributes nothing to the join.
For example, suppose that R is the relation Teaches with attributes ProfName and CourseId; and
S is the relation Takes with attributes StudName and CourseId. A pair (p, c) is in Teaches if and only
if professor p teaches course c. A pair (s, c) is in Takes if and only if student s takes course c. The
join Teaches ./ Takes is the relation with attributes ProfName, StudName, CourseId; it contains a triple
(p, s, c) if and only if professor p teaches student s in course c.
The obvious way to compute R ./ S is to consider each combination of pairs from R and S and create
a triple from each such combination where the two pairs have the same value in the common attribute.
Obviously this algorithm takes time Θ(mn), where m is the number of pairs in R and n is the number of
pairs in S.
In a way, this is the best we can hope for: If all pairs in R and all pairs in S have the same value
in the common attribute C, then there will be mn triples in R ./ S, and we can’t create a set of mn
triples faster than in Ω(mn) time! In practice, however, it is very unlikely that every pair in R joins with
every pair in S; in most cases, the size of R ./ S is much smaller than mn. For instance, in our example
of the Teaches and Takes relations, if these happened to describe, respectively, the courses taught by U
6
of T faculty and the courses taken by U of T students in a particular semester, the first relation would
contain about 4,000 pairs (approximately 2,000 professors each teaching 2 courses) and the second relation
would contain about 250,000 pairs (approximately 50,000 students, each taking 5 courses). In this case the
product mn is approximately one billion, while the actual size of Teaches ./ Takes will be approximately
250,000 — more than three orders of magnitude smaller! (There could be more records in Teaches ./ Takes
than in Takes, in case a course is co-taught by several professors, but such instances are rare, so typically
the size of Teaches ./ Takes will be only a little bit larger than that of Takes.)
In view of this example, we would like to have an algorithm that computes R ./ S in time Θ(k+f(m, n)),
where k is the actual size of R ./ S and f(m, n) is as small a function as possible.
In the following two questions you are asked to devise and describe such algorithms. You should
describe each algorithm in clear and precise English, and illustrate the data structure(s) that you use with
simple example(s); no pseudocode is necessary (but you can use pseudo-code as an additional explanation).
To the extent your algorithm makes use of algorithms or data structures discussed in this course or in its
prerequisites, you can simply state what algorithm(s) and data structure(s) you use without describing
them in detail.
In the following, assume that each of R and S is given as a two dimensional array, where each column
corresponds to an attribute and each row corresponds to a pair of the relation. You can not make any
assumption on the size or distribution of values contained in R or S (i.e., the values associated with
attributes A, B, or C). In particular, some of these values may be very large (e.g., they may be 128-bit
integers), and so it is not feasible to use arrays of size proportional to these values.
Let k be the size of the join R ./ S, and N be the size of the larger of the two given relations R and
S, i.e., N = max(n, m).
a. Describe an algorithm that computes R ./ S in worst-case time O(k + N log N). In other words, the
worst-case running time of your algorithm, with respect to all the inputs R and S that have N = max(n, m)
and whose join is of size k, is O(k + N log N). Your should describe your algorithm as we explained above
(start with a few English sentences explaining the high-level idea of your algorithm).
Explain why the worst-case time complexity of your algorithm is O(k + N log N).
b. Describe an algorithm that computes R ./ S in expected time Θ(k + N) under some reasonable
assumptions that you should state. Your should describe your algorithm as we explained above (start with
a few English sentences explaining the high-level idea of your algorithm).
Explain why the expected running time of your algorithm is Θ(k + N).
What is the worst-case time complexity of this algorithm? Justify your answer.
Hint: Note that R or S may have many pairs with the same value c in attribute C. You should think
about this case carefully when you formulate your assumptions, and when you analyse the time complexity
of your algorithm.
7

More products