$30
Comp 251: Assignment 1
General instructions (Read carefully!)
• Your solution must be submitted electronically on MyCourses.
• You are provided some starter code that you should fill in as requested. Add your code only where
you are instructed to do so. You can add some helper methods. Do not modify the code in any
other way and in particular, do not change the methods or constructors that are already given to
you, do not import extra code and do not touch the method headers. The format that you see on
the provided code is the only format accepted for programming questions. Any failure to comply
with these rules will give you an automatic 0.
• The starter code includes a tester class. If your code fails those tests, it means that there is a mistake
somewhere. Even if your code passes those tests, it may still contain some errors. We will grade
your code with a more challenging set of examples. We therefore highly encourage you to modify
that tester class, expand it and share it with other students on the myCourses discussion board. Do
not include it in your submission.
• Your code should be properly commented and indented.
• Do not change or alter the name of one of the files you must submit. Files with the wrong name
will not be graded. Make sure you are not changing file names by duplicating them. For example,
main (2).java will not be graded. Make sure to double-check your zip file.
• Do not submit individual files. Include all your files into a .zip file and, when appropriate, answer
the complementary quiz online on MyCourses.
• You will automatically get 0 if the files you submitted on MyCourses do not compile.
Questions
Exercise 1 (80 points) We want to compare the performance of hash tables implemented using chaining
and open addressing. In this assignment, we will consider hash tables implemented using the multiplication and linear probing methods. We will (respectively) call the hash functions h and g and describe
them below. Note that we are using the hash function h to define g.
Collisions solved by chaining (multiplication method): h(k) = ((A · k) mod 2w) (w − r)
Open addressing (linear probing): g(k, i) = (h(k) + i) mod 2r
In the formula above, r and w are two integers such that w r, and A is a random number such that
2
w−1 < A < 2
w. In addition, let n be the number of keys inserted, and m the number of slots in the hash
tables. Here, we set m = 2r
and r = dw/2e. The load factor α is equal to n
m
.
We want to estimate the number of collisions in random sequences of insertions and deletions of keys
with respect to the choice of values for w and α. By default we will set
We provide you a set of three template files within COMP251HW1.zip that you will complete. This
file contains three classes, a main class and one for each hash function. Those contain several helper
functions, namely generateRandom that enables you to generate a random number within a specified
range. Details on which functions are included, how to use them, and where to add in your code can be
found as comments in the java files. Please read them with attention. In addition, we provide you a jar
file to visualize your results named JavaPlotBuilder.jar.
Your first task is to complete the two java methods Open_Addressing.probe and Chaining.chain.
These methods must implement the hash functions for (respectively) the linear probing and multiplication methods. They take as input a key k, as well as an integer 0 ≤ i < m for the linear probing method,
and return a hash value in [0, m[. Note that the value of A must be updated when you change w.
Next, you will implement the method insertKey in both classes, which inserts a key k into the
hash table and returns the number of collisions encountered before insertion. Note that for this exercise
2
as well as for the rest of the homework, we define the number of collisions as the number of keys encountered, or "jumped over" before inserting or removing a key. You can assume the key is not negative.
You will also implement a method removeKey, this one only in Open_Addressing. This
method should take as input a key k, and remove it from the hash table while visiting the minimum
number of slots possible. Like insertKey, it should output the number of collisions. If the key is not
in the hash table, the method should simply not change the hash table, and output the number of slots
visited. You will notice from the code and comments that empty slots are given a value of −1. If applicable, you are allowed to use a different notation of your choice for slots containing a deleted element.
Finally, you will complete the method main.main, which calls the previous functions from the
main class. There are three tasks to complete within this main method.
Task 1
First, you will test the effect of increasing the number of keys on the average number of collisions for
each hash function. You will be given an array of keys to insert, keysToInsert, and a list of values
of n to test, nList inside the main method. Random seeds can be used in java in order to make random results reproducible (and eventually evaluate assignments). You will find the hash tables already
initialized with such seed. For each value of n, insert the n first elements of keysToInsert into each
hash table, and store the α value, as well as the average number of collisions for that value of α into the
appropriate provided list. The program is already set up to output a CSV file for you to visualize.
Task 2
Your second task is to test the removeKey method on the Open_Addressing table from task 1
with n=16. Initialize a new Open_Addressing hash table with the same seed as in Task 1, and insert the first 16 elements of keysToInsert. You will be given an array of keys to remove named
keysToRemove. Call removeKey with each of these keys. Store the number of collisions associated
with each removal operation in the arraylist removeCollisions, as well as the index of the key you just
attempted to remove in removeIndex. Finally, use the provided method to output a CSV file.
Task 3
Your third task is to evaluate the effect of varying w on the number of collisions for each method. For
this exercise, the keys you insert will be generated randomly with generateRandom. Each key can
be inserted only once (i.e. The random sequence of keys must have no duplicates). For this exercise,
you will not be using a specific seed, which you can do by calling functions that require a seed argument
with a seed of −1. Because your experiments will now have some variance, you will need to execute 10
simulations for each value of w to obtain representative averages. You will choose appropriate values of
w, use the provided function to output a CSV file, and visualize your results. You will submit a pdf file,
called Conclusions.pdf, including plots of your results, as well as a short explanation of what you
observe, and an explanation for it.
To plot your results, you can use JavaPlotBuilder.jar to generate a plot with α on the x-axis
and the average number of collision on the y-axis. JavaPlotBuilder.jar is run from command
line as follows:
java -jar JavaPlotBuilder.jar filename.csv
3
where filename.csv is the name of the file that is generated by the main method.
For this assignment, you will need to submit a zip file containing the completed version of the three
provided java files, n_comparison.csv, remove_collisions.csv, and w_comparison.csv
(the three CSV files generated by the main method), as well as Conclusions.pdf, the PDF file with
your observations and explanations.
Once you have submitted your files, you can proceed to the second part of the assignment.
Exercise 2 (20 points) This section is answerable through MyCourses. Note that you MUST use your
own results to answer those questions. Answers to this quiz that would not match the results presented
in your pdf file will be considered plagiarism (refer to course outline).
4