Starting from:

$30

Project 1 : Randomized Binary Search Trees

Project 1 : Randomized Binary Search Trees
CS 223

1 Introduction
If keys are inserted into an Binary Search Tree (BST) in random order, then the resulting tree will
be sufficiently balanced with high probability. According to Proposition C (p. 403) in the textbook,
searching a BST built from N random keys requires about 1.39 log2 N compares. Unfortunately, in
practice, key insertions are often ordered in some manner which leads to horribly unbalanced trees.
To remedy this, we can use a random number generator (RNG) in our insertion routine to create a
tree that has the same shape as a tree created with keys inserted in random order. When inserting
a node we make it the root (using the root insertion algorithm described in class) with probability
1/(N + 1) where N is the size of the tree; this is carried out recursively for each subtree.
For this project you will implement the SymbolTable Abstract Data Type (ADT) by using
a BST with randomized insertion (randomized deletion is a bonus). You will name your class
RandomBSTSymbolTable and also implement the serialize method used in class demos for “pretty
printing” BST’s. This will also allow me to analyze the structure of the trees you create. I will
provide you with a battery of test cases to exercise your solution.
Section 2 describes necessary details for implementing randomized BST’s. Other details concerning encapsulation and serialization are specified in Section 3. Submission details are outlined
in Section 4. More details will be given later along with test cases for you to compare with.
2 Randomized BST’s
2.1 Caching tree size N in each node
The algorithm described here requires us to know the size (number of nodes) of each subtree. In
order to avoid the expensive operation of computing the size of a BST we store the tree size N in
each node and update it as necessary (K and V are parameterized types for key and value) :
class Node {
K key; V val;
Node left, right;
int N; // N = cached size of tree (1 for leaf)
...
}
To avoid having to repeatedly check for null references, the following method is useful for fetching
the size of a tree:
int size(Node tree) {return (tree == null) ? 0 : tree.N;}
1
2.2 Updating N in rotation methods
?

?
?
?


? ?
right rotate
A.N = 1 + size(Y) + size(Z);
B.N = 1 + size(X) + A.N;
Figure 1: Fixing tree size N after a right rotation.
We use the left and right tree rotation operations described in class, but they need to account
for the change in size. Figure 1 shows the structural change due to a right rotation; Note that
the size of the (possibly null) subtrees labeled X, Y, and Z are unaltered, but the size of the trees
rooted at A and B need to be recalculated.
2.3 Randomized Insert
We create a RNG that is shared for all instances and seed it with the same value when the class is
loaded (it helps during development to be able to produce the same trees on each run):
static Random rng = new Random(1234);
Node insertRandom(Node tree, K key, V val) {
if (tree == null) return new Node(key, val); // N = 1 (set in constructor)
if (rng.nextDouble()*(tree.N+1) < 1.0) // with probability 1/(N+1)
return insertRoot(tree, key, val); // insert new node at root
int cmp = key.compareTo(tree.key); // else recurse down tree
if (cmp == 0) {
tree.key = key; tree.val = val; // replace payload (N unchanged)
} else if (cmp < 0) {
tree.left = insertRandom(tree.left, key, val); // insert into left tree
tree.N = 1 + tree.left.N + size(tree.right); // update N
} else {
tree.right = insertRandom(tree.right, key, val); // insert into right tree
tree.N = 1 + size(tree.left) + tree.right.N; // update N
}
return tree;
}
Figure 2: Randomized insertion algorithm.
2
Figure 2 lists our randomized insertion algorithm. As we traverse down the tree we consider
the subtree rooted at each node on the search path. We generate a random number r and, if
r < 1/(N + 1) we perform a root insertion; Otherwise, we continue down the tree. The root
insertion method needs to be modified so that N is updated at each node similarly to the code in
Figure 2 (remember to update N before performing a rotation).
2.4 Randomized Remove
As we discussed in class, deleting a BST node with two non-empty children is the tricky case. The
approach we use here, illustrated in Figure 3, is to use a randomized join operation as listed in
Figure 4. This operation relies on the fact that the key’s in X are less than the keys in Y which
will always be true for siblings in a BST.
2.4.1 Randomized Join
?
? ?
? ??
???
delete n
tree tree
Figure 3: Deleting node n and joining its two subtrees.
Node join(Node X, Node Y) {
if (X == null) return Y;
if (Y == null) return X;
if (rng.nextDouble()*(X.N + Y.N) < X.N) { // flip a weighted coin
X.right = join(X.right, Y);
X.N = 1 + size(X.left) + size(X.right);
return X;
} else {
Y.left = join(X, Y.left);
Y.N = 1 + size(Y.left) + size(Y.right);
return Y;
}
}
Figure 4: Randomized join method.
If either subtree X or Y is null, then we have our simple recursive base cases. Otherwise, we
either join X’s right subtree with Y or join X with Y ’s left subtree recursively and then re-attach
3
the resulting subtree appropriately. We choose the first option with probability X.N/(X.N +Y.N);
This prefers to attach the smaller subtree to a child of the larger subtree.
2.4.2 Remove
The randomized remove function is straight forward. When (or if) we find the node to delete, we
simply replace it with the randomized join of its two subtrees. Take care to update the cached N
whenever there is a potential change in size of a tree.
3 Implementation Details
Create the class RandomBSTSymbolTable which implements the SymbolTable interface:
public class RandomBSTSymbolTable<K extends Comparable<K, V
implements SymbolTable<K, V {
...
}
This class should also provide the serialize method (see Section 3.2 below). The remove operation
is a bonus – simply have it return null when its not implemented. The Node subclass will have
an added N field to track the size of thee tree; take care to properly update this value as described
above.
3.1 Encapsulation
Only the following methods should be publicly visible:
1. RandomBSTSymbolTable constructor
2. insert
3. search
4. remove
5. serialize
All other methods and fields should be private. Methods that do not access instance variables
(root in this case) should be declared static. Since the RNG variable is shared by all instances of
the class (i.e.; it is a class variable), it should also be declared static. The nested Node class should
also be static since it does not reference any instance variables.
3.2 Serializing the BST
In addition to implementing the SymbolTable operations, your class should support the serialize()
method that we use for printing our trees (via the TreePrinter class). This allows me to write
test cases that examine the internal structure of your trees. To serialize the tree, we perform an
pre-order traversal of the nodes and stores them sequentially in a Vector as listed in Figure 5;
null is used as a sentinel to mark empty trees. A string is used to mark the content of each node
(typically laid out for printing).
4
private static void serializeAux(Node tree, Vector<String vec) {
if (tree == null)
vec.addElement(null);
else {
vec.addElement(tree.key.toString() + ":black");
serializeAux(tree.left, vec);
serializeAux(tree.right, vec);
}
}
public Vector<String serialize() {
Vector<String vec = new Vector<String();
serializeAux(root, vec);
return vec;
}
Figure 5: Serializing a BST using a pre-order traversal and encoding each node in a vector.
4 Submitting your solution
You will archive all of your source code and documentation in a Java Archive (JAR) file and submit
it via the course electronic submission portal. You can create a JAR file from the command line
as follows:
jar cvf mysolution.jar RandomSymbolTable.java README.txt
You can also have Eclipse create the JAR file for you:
1. Menu: File → Export
2. Expand Java and select ’JAR file’ and click ’Next’
3. Check project source and README.txt
4. Check Export Java source files and resources (leave others unchecked)
5. Name destination JAR file (e.g., mysolution.jar) and click Finish.
I will provide you with a battery of tests that will test a variety of aspects of your code. Please
run these tests before you submit your final solution (you can submit multiple times – the latest
submission will be graded).
I will print BST’s that I use for my test cases. If you use the same RNG seed mentioned in
Section 2.3 (and use the same algorithm) you should get the same trees as me.
4.1 README
You will include in your submission a text file named README.txt (or README.mkd if using Markdown syntax) containing the following information:
5
1. Your name, SID, and email address.
2. List of files in submitted archive.
3. A brief description of the contents of your submission. This is meant to get the uninitiated
an overview of your project and should include...
• A one sentence title / description.
• A short paragraph describing what the project is.
• Instructions on how to build/run/test your code.
The README should not contain implementation details or any information that the client
doesn’t need to know to use your solution.
6

More products