$30
CptS 223 PA #1 - Binary Search Tree Heights
For this project, we will be empirically evaluating a binary search tree and comparing it to our own
calculated analysis. To do this, we will use varying types of input values and chart out how the BST
responds to the stimuli. We will test the BST with sorted data, perfectly balanced data, and randomly
ordered data.
Your implementation needs to conform to a few standards. Notably, the BST MUST use templating to
define what kind of data type it holds. The code must also use a Makefile to build, test, run, and clean
the program. I have provided a testing object to give you the different data sets we’ll be testing with,
a working Makefile, and code that should compile (but not do anything useful).
During this course, I will be giving you less and less initial help with the programming assignments.
This one has a strong framework for beginning, but the next one will rely more on what you’ve
learned from this project. Make sure to read through the code to see a few key items:
1) How templating works in header files
2) How command line options are passed to the program
3) How I wrote up the makefile so you could add or edit it for your own uses
Binary Search Tree (BST)
The code for this project is written in the expected C++11 language and tested using g++ on the EECS
SSH servers. I have provided the interface for the BST, but very little of it is implemented. The code
compiles, but there’s no output.
You are to implement the BST. The most important features will be:
1) void add( value ) - Inserts an integer into the tree
2) int height() - calculates the height of the tree
3) void printPreOrder() - prints out the tree in pre order to STDOUT for inspection
There might be other functions or pieces you’ll need to build, but that’s up to your approach to the
implementation.
Statistics
Once you have a working BST implementation. You need to run it with several data sets and output
the data to a CSV file. If you’re not familiar with CSV files, please look up the file format online. It’s a
very simple one. It stands for Comma Separated Values. Each line in the file is like a row in a
spreadsheet, where each cell is separated by the next one using a comma. A properly built CSV file
can be loaded into a spreadsheet program (LibreOffice Calc, MS Excel, Google Sheets) and viewed as a
spreadsheet, which is probably the easiest way to generate the charts for this assignment.
To get your data, you will need to make 7 BSTs. Each one will hold a different data set. These data sets
are available in the provided TestData object in TestData.h. The BSTs will have:
1) Sorted data (in order from 0..N)
2) Perfectly balanced data
3) 5x trees, each holding scrambled (random order) data
The TestData object has interfaces to get the data:
int get_next_sorted()
int get_next_balanced()
int get_next_scrambled( setNumber )
These all return -1 when they’re out of numbers. I suggest looking at the testing mode code to get an
idea of how to use the interfaces.
The statistics you’ll generate in the CSV file are the heights of the trees at every add (insert) into the
tree. Each line in the CSV file will have 9 columns:
1) N: The number of elements in the tree
2) log(N): The calculation of log_2 (N)
3) Height of sorted tree
4) Height of balanced tree
5) Height of scrambled #0
6) Height of scrambled #1
7) Height of scrambled #2
8) Height of scrambled #3
9) Height of scrambled #4
The first line in the single CSV file sets the names of the columns. Here is a quick snapshot of how the
CSV file should look opened up in Google Sheets:
Expected Output
Your program should generate one CSV file with the gathered statistics, called OutputData-BST.csv,
which includes the statistics of the tree heights (and log_2) for various N values and data orderings.
Based on that CSV file, which you can open in MS Excel, LibreOffice Calc, or Google Docs Sheets,
create 2 charts plotting your calculated tree heights against the input size N. The first chart will have
all 8 columns (including the sorted tree data), the second chart will be everything but the sorted tree
data (since it will make everything look tiny)
● AllTreeHeights.png
● NoSortedTreeHeights.png
Given how much data there will be (over 200k data points), you should probably use very small plot
points. The defaults are normally these huge diamonds, x’s, or others. You just need the line to show
the growth of the tree height as N increases.
Deliverables
You must upload your program through Blackboard no later than midnight on Friday, February 10,
2017. The program will be uploaded as a zip file containing:
● C++ source code
● The charts as an images (AllTreeHeights.png, NoSortedTreeHeights.png)
● The CSV file your program output (OutputData-BST.csv)
Grading Criteria
Your assignment will be judged by the following criteria:
● [70] Code operational success. Your code compiles, executes, and generates the CSV files.
● [10] Your code is well documented and generally easy to read.
● [10] Your program intelligently uses classes when appropriate and generally conforms to good
OOP design (i.e. everything isn't slapped into main).
● [10] The CSV and charts are in the correct formats, with labeled axes, title, and legend (as
appropriate)