$30
Frequency Table Using a Hash Table with Quadratic Probing
CS 223
1 Introduction
For this project you will implement the Frequency Table ADT using a Hash Table that resolves
collisions using quadratic probing.
2 Frequency Table
The primary operations of a Frequency table are click and count. Click increments the integer
counter associated with the key if the key is already in the table; if the key is not in table it is
inserted and given an initial count of 1. Count returns the value of the counter associated with
the key – if the key is not in the table, 0 is returned. Your class should implement the following
interface where K is any (class) type:
public interface FrequencyTable<K {
void click(K key);
int count(K key);
}
For hashing we rely on the methods java.Object.hashCode and java.Object.equals being properly overridden by class K. Since FrequencyTable is a Collection type is should also implement the
java.lang.Iterable<K interface so the client can iterate through all the keys stored in the table.
3 Hashing with quadratic probing
3.1 Quadratic probing
We resolve collisions in the hash table using a quadratic probing scheme that will (hopefully) avoid
clustering. Given a hash value of h :
int h = key.hashCode() & (N - 1); // assumes N is a power of 2
the ith probe (i = 0, 1, . . .) generates the hash table index k as follows:
k = (h + i(i + 1)/2) & (N − 1) (1)
This can (eventually) examine every slot in the table as long as the table size N is a power
of two. Table 1 demonstrates how every index is covered exactly once for N = 8. Note that
(a mod N) = (a & (N − 1)) when a is nonnegative and N is a power of two; this also generates a
proper index even when a is negative (which can happen).
1
i i(i + 1)/2 mod N
0 0
1 1
2 3
3 6
4 2
5 7
6 5
7 4
Table 1: The quadratic probing function i(i + 1)/2 mod N covers all possible values when the size
of the table is a power of two (N = 8 in this example).
3.2 Using an ArrayList for a table
Unfortunately we can not use generic types with Java arrays; i.e., we would like to declare an array
of Entry’s as follows
Entry[] table;
where Entry is composed of a parameterized type K :
private class Entry {
public K key;
public int count;
public Entry(K k) {key=k; count=1;}
}
Anyway, the cool kids use either Vector or ArrayList from the java.util package nowadays.
The big difference is that Vector’s are synchronized whereas ArrayList’s are not. Therefore, since
there is less overhead with ArrayList’s, we’ll use these to store our hash table:
ArrayList<Entry table;
When we create the table we specify its capacity (which we round up to the next power of two)
and then fill it with null references:
int sz = nextPowerOfTwo(initialCapacity);
table = new ArrayList<Entry(sz);
for (int i = 0; i < sz; i++)
table.add(null);
Given n, we can find the next power of two 2e ≥ n with the following function:
private static int nextPowerOfTwo(int n) {
int e = 1;
while ((1 << e) < n)
e++;
return 1 << e;
}
2
3.3 Rehashing
The client will specify the initial minimum capacity and the maximum load factor when the table
is created. You will need to track the number of entries in the table, so you can detect when the
load factor exceeds this maximum; when this happens, you will double the size of the table (thus
maintaining its power of two size) and rehash all the entries into the new table.
4 HashFrequencyTable
4.1 Constructing a New Table
You class should be named HashFrequencyTable; the client specifies the minimum initial capacity
and the maximum load factor when constructing a new table. The initial capacity is rounded to
the next power of two (e.g.; if a capacity of 100 is specified, a size of 128 is used):
public class HashFrequencyTable<K implements FrequencyTable<K, Iterable<K {
...
public HashFrequencyTable(int initialCapacity, float maxLoadFactor) {...}
...
}
4.2 Iterators
To implement the Iterable<K interface, define an inner class that handles the details of iterating
though the non-null elements of the table:
private class TableIterator implements Iterator<K {
private int i;
public TableIterator() {i = 0;}
public boolean hasNext() {
while (i < table.size() && table.get(i) == null)
i++;
return i < table.size();
}
public K next() {
return table.get(i++).key;
}
public void remove() {
throw new UnsupportedOperationException("Remove not supported");
}
}
The iterator() method simply provides and instance of this class:
public Iterator<K iterator() {
return new TableIterator();
}
3
4.3 Unit Test
For diagnostic purposes, allow the client to dump the contents of the table:
public void dump(PrintStream str) { ... }
Place the following main method in your HashFrequencyTable implementation which constructs a
small table and dumps the contents:
public static void main(String[] args) {
String hamlet =
"To be or not to be that is the question " +
"Whether ’tis nobler in the mind to suffer " +
"The slings and arrows of outrageous fortune ";
String words[] = hamlet.split("\\s+");
HashFrequencyTable<String table = new HashFrequencyTable<String(10, 0.95F);
for (int i = 0; i < words.length; i++)
if (words[i].length() 0)
table.click(words[i]);
table.dump(System.out);
}
I’ll post the output from my implementation to compare with.
5 Word Frequency
I will provide you with an application SortedWordCount.java that uses your HashFrequencyTable
to track the frequency words in a file specified on the command line:
$ java SortedWordCount mobydick-words.txt 10
14175 the
6469 of
6325 and
4629 a
4539 to
4077 in
3041 that
2495 his
2494 it
1980 i
6 What to submit
You will archive all your source code, a README text file, and any other supporting files into a jar file
and submit your solution electronically via the class web site (do not include .class files). Make
sure you test your solution thoroughly and implement the interface described in this document using
proper encapsulation. The README file should contain your name and email, a brief description of
the project, and an explanation of how to build and run the test applications.
4