Starting from:

$30

Project 5 (KdTrees)

Project 5 (KdTrees)
Exercises
Exercise 1. (Array-based Symbol Table) Implement a data type called ArrayST that uses an unordered array as the underlying
data structure to implement the basic symbol table API.
& ~/workspace/project5
$ java ArrayST < data / tinyST . txt
S 0
E 12
A 8
R 3
C 4
H 5
X 7
M 9
P 10
L 11
Exercise 2. (Spell Checker ) Implement a program called Spell that accepts f ilename (String) as command-line argument,
which is the name of a file containing common misspellings (a line-oriented file with each comma-separated line containing
a misspelled word and the correct spelling); reads text from standard input; and writes to standard output the misspelled
words in the text, the line numbers where they occurred, and their corrections.
& ~/workspace/project5
$ java Spell data / misspellings . txt < data / war_and_peace . txt
wont :5370 -> won ’t
unconciousness :16122 -> unconsciousness
accidently :18948 -> accidentally
leaded :21907 -> led
wont :22062 -> won ’ t
aquaintance :30601 -> acquaintance
wont :39087 -> won ’ t
wont :50591 -> won ’ t
planed :53591 -> planned
wont :53960 -> won ’ t
Ukranian :58064 -> Ukrainian
wont :59650 -> won ’ t
conciousness :59835 -> consciousness
occuring :59928 -> occurring
1 / 9
Project 5 (KdTrees)
Problems
The purpose of this project is to create a symbol table data type whose keys are two-dimensional points. We’ll use a 2dTree
to support efficient range search (find all the points contained in a query rectangle) and k-nearest neighbor search (find k
points that are closest to a query point). 2dTrees have numerous applications, ranging from classifying astronomical objects
to computer animation to speeding up neural networks to mining data to image retrieval.
Geometric Primitives We will use the data types dsa.Point2D and dsa.RectHV to represent points and axis-aligned rectangles
in the plane.
Symbol Table API Here is a Java interface PointST<Value> specifying the API for a symbol table data type whose keys are
Point2D objects and values are generic objects:
² PointST<Value>
boolean isEmpty() returns true if this symbol table is empty, and false otherwise
int size() returns the number of key-value pairs in this symbol table
void put(Point2D p, Value value) inserts the given point and value into this symbol table
Value get(Point2D p) returns the value associated with the given point in this symbol table, or null
boolean contains(Point2D p) returns true if this symbol table contains the given point, and false otherwise
Iterable<Point2D> points() returns all the points in this symbol table
Iterable<Point2D> range(RectHV rect) returns all the points in this symbol table that are inside the given rectangle
Point2D nearest(Point2D p) returns the point in this symbol table that is different from and closest to the
given point, or null
Iterable<Point2D> nearest(Point2D p, int k) returns up to k points from this symbol table that are different from and closest
to the given point
Problem 1. (Brute-force Implementation) Develop a data type called BrutePointST that implements the above API using a
red-black BST (RedBlackBST) as the underlying data structure.
2 / 9
Project 5 (KdTrees)
² BrutePointST<Value> implements PointST<Value>
BrutePointST() constructs an empty symbol table
Corner Cases
ˆ The put() method should throw a NullPointerException() with the message "p is null" if p is null and the message "value is null"
if value is null.
ˆ The get(), contains(), and nearest() methods should throw a NullPointerException() with the message "p is null" if p is null.
ˆ The rect() method should throw a NullPointerException() with the message "rect is null" if rect is null.
Performance Requirements
ˆ The isEmpty() and size() methods should run in time T(n) ∼ 1, where n is the number of key-value pairs in the symbol
table.
ˆ The put(), get(), and contains() methods should run in time T(n) ∼ log n.
ˆ The points(), range(), and nearest() methods should run in time T(n) ∼ n.
& ~/workspace/project5
$ java BrutePointST 0.975528 0.345492 5 < data / circle10 . txt
st . size () = 10
st . contains ((0.975528 , 0.345492))? true
st . range ([ -1.0 , -1.0] x [1.0 , 1.0]):
(0.5 , 0.0)
(0.206107 , 0.095492)
(0.793893 , 0.095492)
(0.024472 , 0.345492)
(0.975528 , 0.345492)
(0.024472 , 0.654508)
(0.975528 , 0.654508)
(0.206107 , 0.904508)
(0.793893 , 0.904508)
(0.5 , 1.0)
st . nearest ((0.975528 , 0.345492)) = (0.975528 , 0.654508)
st . nearest ((0.975528 , 0.345492) , 5):
(0.975528 , 0.654508)
(0.793893 , 0.095492)
(0.793893 , 0.904508)
(0.5 , 0.0)
(0.5 , 1.0)
Directions:
ˆ Instance variable:
– An underlying data structure to store the 2d points (keys) and their corresponding values, RedBlackBST<Point2D, Value> bst.
ˆ BrutePointST()
– Initialize the instance variable bst appropriately.
ˆ int size()
– Return the size of bst.
ˆ boolean isEmpty()
– Return whether bst is empty.
ˆ void put(Point2D p, Value value)
– Insert the given point and value into bst.
3 / 9
Project 5 (KdTrees)
ˆ Value get(Point2D p)
– Return the value associated with the given point in bst, or null.
ˆ boolean contains(Point2D p)
– Return whether bst contains the given point.
ˆ Iterable<Point2D> points()
– Return an iterable object containing all the points in bst.
ˆ Iterable<Point2D> range(RectHV rect)
– Return an iterable object containing all the points in bst that are inside the given rectangle.
ˆ Point2D nearest(Point2D p)
– Return a point from bst that is different from and closest to the given point, or null.
ˆ Iterable<Point2D> nearest(Point2D p, int k)
– Return up to k points from bst that are different from and closest to the given point.
Problem 2. (2dTree Implementation) Develop a data type called KdTreePointST that uses a 2dTree to implement the above
symbol table API.
² KdTreePointST<Value> implements PointST<Value>
KdTreePointST() constructs an empty symbol table
A 2dTree is a generalization of a BST to two-dimensional keys. The idea is to build a BST with points in the nodes, using
the x- and y-coordinates of the points as keys in strictly alternating sequence, starting with the x-coordinates.
ˆ Search and insert. The algorithms for search and insert are similar to those for BSTs, but at the root we use the
x-coordinate (if the point to be inserted has a smaller x-coordinate than the point at the root, go left; otherwise go
right); then at the next level, we use the y-coordinate (if the point to be inserted has a smaller y-coordinate than the
point in the node, go left; otherwise go right); then at the next level the x-coordinate, and so forth.
ˆ Level-order traversal. The points() method should return the points in level-order: first the root, then all children of
the root (from left/bottom to right/top), then all grandchildren of the root (from left to right), and so forth. The
level-order traversal of the 2dTree above is (.7, .2), (.5, .4), (.9, .6), (.2, .3), (.4, .7).
4 / 9
Project 5 (KdTrees)
The prime advantage of a 2dTree over a BST is that it supports efficient implementation of range search, nearest neighbor,
and k-nearest neighbor search. Each node corresponds to an axis-aligned rectangle, which encloses all of the points in its
subtree. The root corresponds to the infinitely large square from [(−∞, −∞),(+∞, +∞)]; the left and right children of the
root correspond to the two rectangles split by the x-coordinate of the point at the root; and so forth.
ˆ Range search. To find all points contained in a given query rectangle, start at the root and recursively search for points
in both subtrees using the following pruning rule: if the query rectangle does not intersect the rectangle corresponding
to a node, there is no need to explore that node (or its subtrees). That is, you should search a subtree only if it might
contain a point contained in the query rectangle.
ˆ Nearest neighbor search. To find a closest point to a given query point, start at the root and recursively search in both
subtrees using the following pruning rule: if the closest point discovered so far is closer than the distance between the
query point and the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is,
you should search a node only if it might contain a point that is closer than the best one found so far. The effectiveness
of the pruning rule depends on quickly finding a nearby point. To do this, organize your recursive method so that when
there are two possible subtrees to go down, you choose first the subtree that is on the same side of the splitting line as
the query point; the closest point found while exploring the first subtree may enable pruning of the second subtree.
ˆ k-nearest neighbor search. Use the nearest neighbor search described above.
Corner Cases
ˆ The put() method should throw a NullPointerException() with the message "p is null" if p is null and the message "value is null"
if value is null.
ˆ The get(), contains(), and nearest() methods should throw a NullPointerException() with the message "p is null" if p is null.
ˆ The rect() method should throw a NullPointerException() with the message "rect is null" if rect is null.
Performance Requirements
ˆ The isEmpty() and size() methods should run in time T(n) ∼ 1, where n is the number of key-value pairs in the symbol
table.
ˆ The put(), get(), contains(), range(), and nearest() methods should run in time T(n) ∼ log n.
ˆ The points() method should run in time T(n) ∼ n.
& ~/workspace/project5
$ java KdTreePointST 0.975528 0.345492 5 < data / circle10 . txt
st . empty ()? false
st . size () = 10
st . contains ((0.975528 , 0.345492))? true
st . range ([ -1.0 , -1.0] x [1.0 , 1.0]):
(0.206107 , 0.095492)
(0.024472 , 0.345492)
(0.024472 , 0.654508)
(0.975528 , 0.654508)
(0.793893 , 0.095492)
(0.5 , 0.0)
(0.975528 , 0.345492)
(0.793893 , 0.904508)
(0.206107 , 0.904508)
(0.5 , 1.0)
st . nearest ((0.975528 , 0.345492)) = (0.975528 , 0.654508)
st . nearest ((0.975528 , 0.345492) , 5):
(0.5 , 1.0)
(0.5 , 0.0)
(0.793893 , 0.904508)
(0.793893 , 0.095492)
(0.975528 , 0.654508)
Directions:
ˆ Instance variables:
5 / 9
Project 5 (KdTrees)
– Reference to the root of a 2dTree, Node root.
– Number of nodes in the tree, int n.
ˆ KdTreePointST()
– Initialize instance variables root and n appropriately.
ˆ int size()
– Return the number of nodes in the 2dTree.
ˆ boolean isEmpty()
– Return whether the 2dTree is empty.
ˆ void put(Point2D p, Value value)
– Call the private put() method with appropriate arguments to insert the given point and value into the 2dTree; the
parameter lr in this and other helper methods represents if the currrent node is x-aligned (lr = true) or y-aligned
(lr = false).
ˆ Node put(Node x, Point2D p, Value value, RectHV rect, boolean lr)
– If x = null, return a new Node object built appropriately.
– If the point in x is the same as the given point, update the value in x to the given value.
– Otherwise, make a recursive call to put() with appropriate arguments to insert the given point and value into the
left subtree x.lb or the right subtree x.rt depending on how x.p and p compare (use lr to decide which coordinate
to consider).
– Return x.
ˆ Value get(Point2D p)
– Call the private get() method with appropriate arguments to find the value corresponding to the given point.
ˆ Value get(Node x, Point2D p, boolean lr)
– If x = null, return null.
– If the point in x is the same as the given point, return the value in x.
– Make a recursive call to get() with appropriate arguments to find the value corresponding to the given point in the
left subtree x.lb or the right subtree x.rt depending on how x.p and p compare.
ˆ boolean contains(Point2D p)
– Return whether the given point is in the 2dTree.
ˆ Iterable<Point2D> points()
– Return all the points in the 2dTree, collected using a level-order traversal of the tree; use two queues, one to aid
in the traversal and the other to collect the points.
ˆ Iterable<Point2D> range(RectHV rect)
– Call the private range() method with appropriate arguments, the last one being an empty queue of Point2D objects,
and return the queue.
ˆ void range(Node x, RectHV rect, Queue<Point2D> q)
– If x = null, simply return.
– If rect contains the point in x, enqueue the point into q
6 / 9
Project 5 (KdTrees)
– Make recursive calls to range() on the left subtree x.lb and on the right subtree x.rt.
– Incorporate the range search pruning rule mentioned in the project writeup.
ˆ Point2D nearest(Point2D p)
– Return a point from the 2dTree that is different from and closest to the given point by calling the private method
nearest() with appropriate arguments.
ˆ Point2D nearest(Node x, Point2D p, Point2D nearest, boolean lr)
– If x = null, return nearest.
– If the point x.p is different from the given point p and the squared distance between the two is smaller than the
squared distance between nearest and p, update nearest to x.p.
– Make a recursive call to nearest() on the left subtree x.lb.
– Make a recursive call to nearest() on the right subtree x.rt, using the value returned by the first call in an appropriate
manner.
– Incorporate the nearest neighbor pruning rules mentioned in the project writeup.
ˆ Iterable<Point2D> nearest(Point2D p, int k)
– Call the private nearest() method passing it an empty maxPQ of Point2D objects (built with a suitable comparator
from Point2D) as one of the arguments, and return the PQ.
ˆ void nearest(Node x, Point2D p, int k, MaxPQ<Point2D> pq, boolean lr)
– If x = null or if the size of pq is greater than k, simply return.
– If the point in x is different from the given point, insert it into pq.
– If the size of pq exceeds k, remove the maximum point from the pq.
– Make recursive calls to nearest() on the left subtree x.lb and on the right subtree x.rt.
– Incorporate the nearest neighbor pruning rules mentioned in the project writeup.
Data The data directory contains a number of sample input files, each with n points (x, y), where x, y ∈ (0, 1); for example
& ~/workspace/project5
$ cat data / input100 . txt
0.042191 0.783317
0.390296 0.499816
0.666260 0.752352
...
0.263965 0.906869
0.564479 0.679364
0.772950 0.196867
Visualization Programs The program RangeSearchVisualizer reads a sequence of points from a file (specified as a command-line
argument), inserts those points into BrutePointST (red) and KdTreePointST (blue) based symbol tables, performs range searches on
the axis-aligned rectangles dragged by the user, and displays the points obtained from the symbol tables in red and blue.
& ~/workspace/project5
$ java RangeSearchVisualizer data / input100 . txt
7 / 9
Project 5 (KdTrees)
The program NearestNeighborVisualizer reads a sequence of points from a file (specified as the first command-line argument),
inserts those points into BrutePointST (red) and KdTreeSPointT (blue) based symbol tables, performs k (specified as the second
command-line argument) nearest-neighbor queries on the point corresponding to the location of the mouse, and displays the
neighbors obtained from the symbol tables in red and blue.
& ~/workspace/project5
$ java NearestNeighborVisualizer data / input100 . txt 5
The program BoidSimulator W simulates the flocking behavior of birds, using a BrutePointST or KdTreePointST data type; the first
command-line argument specifies which data type to use (brute or kdtree), the second argument specifies the number of boids,
and the third argument specifies the number of friends each boid has.
& ~/workspace/project5
$ java BoidSimulator kdtree 1000 10
8 / 9
Project 5 (KdTrees)
Acknowledgements This project is an adaptation of the KdTrees assignment developed at Princeton University by Kevin
Wayne, with boid simulation by Josh Hug.
Files to Submit
1. ArrayST.java
2. Spell.java
3. BrutePointST.java
4. KdTreePointST.java
5. report.txt
Before you submit your files, make sure:
ˆ Your programs meet the style requirements by running the following command in the terminal.
& ~/workspace/project5
$ check_style src /*. java
ˆ Your code is adequately commented, follows good programming principles, and meets any specific requirements
such as corner cases and running times.
ˆ You use the template file report.txt for your report.
ˆ Your report meets the prescribed guidelines.
9 / 9

More products