$30
CS210 Project 5 (Kd-trees)
This document only contains the description of the project and the project problems. For the programming exercises on
concepts needed for the project, please refer to the project checklist .
The purpose of this assignment is to create a symbol table data type whose keys are two-dimensional points. We’ll use a
2d-tree to support efficient range search (find all the points contained in a query rectangle) and k-nearest neighbor search
(find k points that are closest to a query point). 2d-trees have numerous applications, ranging from classifying astronomical
objects to computer animation to speeding up neural networks to mining data to image retrieval.
Geometric Primitives To get started, use the following geometric primitives for points and axis-aligned rectangles in the
plane.
Use the immutable data type edu.princeton.cs.algs4.Point2D for points in the plane. Here is the subset of its API that you
may use:
method description
Point2D(double x, double y) construct the point (x, y)
double x() x-coordinate
double y() y-coordinate
double distanceSquaredTo(Point2D that) square of Euclidean distance between this point and that
Comparator<Point2D distanceToOrder() a comparator that compares two points by their distance to this point
boolean equals(Point2D that) does this point equal that?
String toString() a string representation of this point
Use the immutable data type edu.princeton.cs.algs4.RectHV for axis-aligned rectangles. Here is the subset of its API that you
may use:
1 of 4
CS210 Project 5 (Kd-trees) Swami Iyer
method description
RectHV(double xmin, double ymin, double xmax, double ymax) construct the rectangle [xmin, xmax] × [ymin, ymax]
double xmin() minimum x-coordinate of rectangle
double xmax() maximum x-coordinate of rectangle
double ymin() minimum y-coordinate of rectangle
double ymax() maximum y-coordinate of rectangle
boolean contains(Point2D p) does this rectangle contain the point p (either inside or on
boundary)?
boolean intersects(RectHV that) does this rectangle intersect that rectangle (at one or more
points)?
double distanceSquaredTo(Point2D p) square of Euclidean distance from point p to closest point in
rectangle
boolean equals(RectHV that) does this rectangle equal that?
String toString() a string representation of this rectangle
Symbol Table API Here is a Java interface PointST<Value specifying the API for a symbol table data type whose keys are
two-dimensional points represented as Point2D objects:
method description
boolean isEmpty() is the symbol table empty?
int size() number of points in the symbol table
void put(Point2D p, Value val) associate the value val with point p
Value get(Point2D p) value associated with point p
boolean contains(Point2D p) does the symbol table contain the point p?
Iterable<Point2D points() all points in the symbol table
Iterable<Point2D range(RectHV rect) all points in the symbol table that are inside the rectangle rect
Point2D nearest(Point2D p) a nearest neighbor to point p; null if the symbol table is empty
Iterable<Point2D nearest(Point2D p, int k) k points that are closest to point p
Problem 1. (Brute-force Implementation) Write a mutable data type BrutePointST that implements the above API using a
red-black BST (edu.princeton.cs.algs4.RedBlackBST).
Corner cases. Throw a java.lang.NullPointerException if any argument is null.
Performance requirements. Your implementation should support put(), get() and contains() in time proportional to the
logarithm of the number of points in the set in the worst case; it should support points(), range(), and nearest() in time
proportional to the number of points in the symbol table.
$ java BrutePointST 0.661633 0.287141 0.65 0.68 0.28 0.29 5 < data / input10K . txt
st . empty ()? false
st . size () = 10000
First 5 values :
3380
1585
8903
4168
5971
7265
st . contains ((0.661633 , 0.287141))? true
st . range ([0.65 , 0.68] x [0.28 , 0.29]):
(0.663908 , 0.285337)
(0.661633 , 0.287141)
(0.671793 , 0.288608)
st . nearest ((0.661633 , 0.287141)) = (0.663908 , 0.285337)
st . nearest ((0.661633 , 0.287141) , 5):
(0.663908 , 0.285337)
(0.658329 , 0.290039)
(0.671793 , 0.288608)
(0.65471 , 0.276885)
(0.668229 , 0.276482)
2 of 4
CS210 Project 5 (Kd-trees) Swami Iyer
Problem 2. (2d-tree Implementation) Write a mutable data type KdTreePointST that uses a 2d-tree to implement the above
symbol table API. A 2d-tree is a generalization of a BST to two-dimensional keys. The idea is to build a BST with points in
the nodes, using the x- and y-coordinates of the points as keys in strictly alternating sequence, starting with the x-coordinates.
• Search and insert. The algorithms for search and insert are similar to those for BSTs, but at the root we use the
x-coordinate (if the point to be inserted has a smaller x-coordinate than the point at the root, go left; otherwise go
right); then at the next level, we use the y-coordinate (if the point to be inserted has a smaller y-coordinate than the
point in the node, go left; otherwise go right); then at the next level the x-coordinate, and so forth.
• Level-order traversal. The points() method should return the points in level-order: first the root, then all children of
the root (from left/bottom to right/top), then all grandchildren of the root (from left to right), and so forth. The
level-order traversal of the 2d-tree above is (0.7, 0.2), (0.5, 0.4), (0.9, 0.6), (0.2, 0.3), (0.4, 0.7).
The prime advantage of a 2d-tree over a BST is that it supports efficient implementation of range search, nearest neighbor,
and k-nearest neighbor search. Each node corresponds to an axis-aligned rectangle, which encloses all of the points in its
subtree. The root corresponds to the infinitely large square from [(−∞, −∞),(+∞, +∞)]; the left and right children of the
root correspond to the two rectangles split by the x-coordinate of the point at the root; and so forth.
• Range search. To find all points contained in a given query rectangle, start at the root and recursively search for points
in both subtrees using the following pruning rule: if the query rectangle does not intersect the rectangle corresponding
to a node, there is no need to explore that node (or its subtrees). That is, you should search a subtree only if it might
contain a point contained in the query rectangle.
• Nearest neighbor search. To find a closest point to a given query point, start at the root and recursively search in both
subtrees using the following pruning rule: if the closest point discovered so far is closer than the distance between the
query point and the rectangle corresponding to a node, there is no need to explore that node (or its subtrees). That is,
you should search a node only if it might contain a point that is closer than the best one found so far. The effectiveness
of the pruning rule depends on quickly finding a nearby point. To do this, organize your recursive method so that when
there are two possible subtrees to go down, you choose first the subtree that is on the same side of the splitting line as
the query point; the closest point found while exploring the first subtree may enable pruning of the second subtree.
• k-nearest neighbor search. Use the technique from kd-tree nearest neighbor search described above.
Corner cases. Throw a java.lang.NullPointerException if any argument is null.
java KdTreePointST 0.661633 0.287141 0.65 0.68 0.28 0.29 5 < data / input10K . txt
st . empty ()? false
st . size () = 10000
First 5 values :
0
2
1
4
3
3 of 4
CS210 Project 5 (Kd-trees) Swami Iyer
62
st . contains ((0.661633 , 0.287141))? true
st . range ([0.65 , 0.68] x [0.28 , 0.29]):
(0.671793 , 0.288608)
(0.663908 , 0.285337)
(0.661633 , 0.287141)
st . nearest ((0.661633 , 0.287141)) = (0.663908 , 0.285337)
st . nearest ((0.661633 , 0.287141) , 5):
(0.668229 , 0.276482)
(0.65471 , 0.276885)
(0.671793 , 0.288608)
(0.658329 , 0.290039)
(0.663908 , 0.285337)
Acknowledgements This project is an adaptation of the Kd-Trees assignment developed at Princeton University by Kevin
Wayne, with boid simulation by Josh Hug.
4 of 4