Asymptotic Analysis Project

Starting from:

$30

Asymptotic Analysis Project
1 Overview
This project asks you to implement four sorting algorithms in C++ or Java. You will then run your algorithm data of different size and type, and analyze these results to see how their run times relate to their theoretical asymptotic analysis. 2 Algorithms Implement the following sorting algorithms in the file Sorting.hpp or Sorting.java. For full credit, each function must: • Implement the algorithms as described Section 2.4, using the given function signatures. • Be generic (i.e., a function that takes as input an array of any type of item). You may assume that the item type overloaded the comparison operators (i.e., you may use the < and operators on the array of items). • Compile with no errors. If your code fails to compile on the C4 Linux Lab machines, you will receive major penalties on other sections of this project. • Correctly sort the given input data. • Be efficient (i.e., implementations that take too long to solve sample problems will be assessed a penalty) 1 • Be readable and easy to understand. You should include comments to explain when needed, but you should not include excessive comments that makes the code difficult to read. • (C++ only) Have no memory leaks. • (Java only) All of your functions should be defined as static member functions of the class Sorting. Your final submission should not include a main function. However, you may implement additional helper functions as needed. Helper functions must be located in the same file. You will be provided with source files that include a main function, as well as several helper functions that you may use. You should not include either of these files in your code submission, though you may invoke their functions (and #include them). 2.1 C++ method signatures void selectionsort(T* data, int size) void insertionsort(T* data, int size) void mergesort(T* data, int size, T* temp) void quicksort(T* data, int size) 2.2 Java method signatures public static void selectionsort(T[] data) public static void insertionsort(T[] data) public static void mergesort(T[] data, int left, int right, T[] temp) public static void quicksort(T[] data, int left, int right) 2.3 Arguments • data: the list of elements to sort; must be comparable • size: the number of elements in items • left: the index of the first element in the array (0 initially) 2 • right: the index of the last element in the array (n − 1 initially) 2.4 Pseudocode Pseudocode for the four algorithms appears below. Note, array indices are given with base of 0 to reduce confusion in implementation. Your implementation of QuickSort will use the median-of-three strategy for selecting the pivot element, the value used to partition the array. Other strategies for pivot selection, such as choosing the first, the last, or a random element, have different advantages and disadvantages, though we will not compare the various pivot strategies in this project. Input: data: the items to sort (must be comparable) Input: n: the number of elements in items Output: permutation of items such that data[1] ≤ . . . ≤ data[n] 1 Algorithm: SelectionSort 2 for i = 0 to n − 1 do 3 Let m be the location of the min value in the array data[i..n − 1]; 4 Swap data[i] and data[m]; 5 end 6 return data; Input: data: the data to sort (must be comparable) Input: n: the number of elements in data Output: permutation of data such that data[0] ≤ . . . ≤ data[n − 1] 1 Algorithm: InsertionSort 2 for i = 1 to n − 1 do 3 Let j = location of predecessor of data[i] in data[0..i − 1], or -1 if data[i] is min; 4 Shift data[j + 1..i − 1] to the right one space; 5 data[j + 1] = ins; 6 end 7 return data; 3 Input: data: the data to sort (must be comparable) Input: n: the number of elements in data Input: temp: temporary array of size n for use during MergeSort Output: a permutation of data such that data[1] ≤ . . . ≤ data[n] 1 Algorithm: MergeSort 2 if n 1 then 3 mid =floor((n + 1)/2); 4 lef t =MergeSort(data[0..mid − 1], mid, temp[0..mid − 1]); 5 right =MergeSort(data[mid..n − 1], n − mid, temp[mid..n − 1]); 6 ` = r = s = 0; 7 while ` < mid and r < n − mid do 8 if lef t[`] < right[r] then 9 temp[s] = lef t[`]; 10 ` = ` + 1; 11 else 12 temp[s] = right[r]; 13 r = r + 1; 14 end 15 s = s + 1; 16 end 17 Copy lef t[`..mid − 1] to temp[s..s + mid − `]; 18 s = s + mid − `; 19 Copy right[r..n − mid − 1] to temp[s..s + n − mid − 1 − r]; 20 Copy temp[0..n − 1] to data[0..n − 1]; 21 end 22 return data; 4 Input: data: the data to sort (must be comparable) Input: lef t: the first element of data to sort Input: right: the element after the last element of data to sort Input: temp: temporary array of size n for use during MergeSort Output: a permutation of data such that data[lef t] ≤ . . . ≤ data[right − 1] 1 Algorithm: MergeSortJ 2 if n 1 then 3 mid =floor((lef t + right)/2); 4 Call MergeSortJ(data, lef t, mid, temp); 5 Call MergeSortJ(data, mid, right, temp); 6 ` = lef t; 7 r = mid; 8 s = 0; 9 while ` < mid and r < right do 10 if data[`] < data[r] then 11 temp[s] = data[`]; 12 ` = ` + 1; 13 else 14 temp[s] = data[r]; 15 r = r + 1; 16 end 17 s = s + 1; 18 end 19 Copy lef t[`..mid − 1] to temp[s..s + mid − `]; 20 s = s + mid − `; 21 Copy right[r..n − mid − 1] to temp[s..s + n − mid − r]; 22 Copy temp[0..right − lef t − 1] to data[lef t..right − 1]; 23 end 24 return data; 5 Input: data: the data to sort (must be comparable) Input: n: the number of elements in data Output: permutation of data such that data[1] ≤ . . . ≤ data[n] 1 Algorithm: QuickSort 2 if n ≤ 1 then 3 return data; 4 mid =floor((n + 1)/2); 5 Let pivot be such that data[pivot] is the median of data[0], data[mid], and data[n − 1]; 6 Swap data[pivot] and data[0]; 7 lef t = 0; 8 right = n − 1; 9 repeat 10 while lef t < right and data[lef t] ≤ data[0] do 11 lef t = lef t + 1; 12 end 13 while lef t < right and data[right] data[0] do 14 right = right − 1; 15 end 16 Swap data[lef t] and data[right]; 17 until lef t ≥ right; 18 if data[lef t] data[0] then 19 lef t = lef t − 1; 20 end 21 Swap data[0] and data[lef t]; 22 Call QuickSort on data[0..lef t − 1]; 23 Call QuickSort on data[lef t + 1..n − 1]; 24 return data; 6 Input: data: the data to sort (must be comparable) Input: lef t: the first element in data to sort Input: right: the index after the last element of data to sort Output: permutation of data such that data[lef t] ≤ . . . ≤ data[right − 1] 1 Algorithm: QuickSortJ 2 if right − lef t ≤ 1 then 3 return data; 4 mid =floor((lef t + right)/2); 5 Let pivot be such that data[pivot] is the median of data[lef t], data[mid], and data[right − 1]; 6 Swap data[pivot] and data[lef t]; 7 ` = lef t; 8 r = right − 1; 9 repeat 10 while ` < r and data[`] ≤ data[lef t] do 11 ` = ` + 1; 12 end 13 while ` < r and data[r] data[lef t] do 14 r = r − 1; 15 end 16 Swap data[`] and data[r]; 17 until ` ≥ r; 18 if data[`] data[lef t] then 19 ` = ` − 1; 20 end 21 Swap data[lef t] and data[`]; 22 Call QuickSortJ(data, lef t, `); 23 Call QuickSortJ(data, ` + 1, right); 24 return data; 7 3 Project report Your project report should be divided into two parts, Results and Analysis. For the Results section, you will need to prepare a data file describing the performance of your algorithms, and you will need to prepare 16 equations that model the performance of your algorithms, as well as a response to these results, in the Analysis portion. 3.1 Results Test each of the four sorting algorithms on increasing, decreasing, random, and constant arrays of sizes 10,000–100,000, in multiples of 10,000. Note, you may need to ensure that you have sufficient stack space before testing QuickSort to ensure that you do not run out (“stack overflow”). If you are using C++, you can run ulimit -s unlimited in Linux before executing the algorithm to increase the available stack space. For Java, you can pass the -Xss[size] argument to define a new stack size (e.g., -Xss1024m gives 1024 MB of stack space). You should enter your results into a comma-separated value (CSV) file. The CSV file should contain 17 columns, with 11 rows. Your first column should list the data sizes for your experiment in columns 2–11, while the first row should label the 16 different experiments in columns 2–17. Your column labels should include the algorithm name (SelectionSort, InsertionSort, MergeSort, or Quicksort) and input type (Increasing, Decreasing, Random, or Constant). You may abbreviate these labels as S, I, M, Q, and I, D, R, C. (For example, MC represents your MergeSort result on a constant array.) Each cell in the table should represent your median-of-three result for the given experiment. An example table appears below. You may use Excel (or any other software) to prepare your data. SI SD SR SC II ID IR IC MI MD MR MC QI QD QR QC 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 8 3.2 Analysis For this portion of the report, you will want to use a software package capable of manipulating matrices. You may use MATLAB, which is installed on the C4 Lab machines, or you download R (for free) at: http://www.r-project. org. In this section, you will estimate the complexity of the four algorithms using ordinary least-squares (OLS) regression. Specifically, you will estimate the median runtime for each algorithm and array combination as a linear combination of four functions: n 2 , n lg n, n, and 1, where n is the input size. If you have forgotten (or never learned) how to do this, please see the appendix for instructions. Your results should be in the following form: Algorithm: SelectionSort Increasing: S1(n) ≈ a1n 2 + b1n lg n + c1n + d1 Decreasing: S2(n) ≈ a2n 2 + b2n lg n + c2n + d2 Constant: S3(n) ≈ a3n 2 + b3n lg n + c3n + d3 Random: S4(n) ≈ a4n 2 + b4n lg n + c4n + d4 You should then write a summary of (1) how your results compare to the theoretical analysis for the four algorithms (below), and (2) why your results make sense or are surprising. You should spend more time explaining your results when they are unusual or unexpected. Best-case Average-case Worst-case complexity complexity complexity SelectionSort Θ(n 2 ) Θ(n 2 ) Θ(n 2 ) InsertionSort Θ(n) O(n 2 ) O(n 2 ) MergeSort Θ(n lg n) Θ(n lg n) Θ(n lg n) QuickSort Θ(n lg n) Θ(n lg n) O(n 2 ) 4 Submission For this project, you should submit a zip archive containing (1) your code (in Sorting.hpp or Sorting.java) containing the four sorting functions, (2) a CSV file containing your results (described in Section 3.1), and (3) your analysis (described in Section 3.2), in PDF format. Note: This is an individual project. You are not allowed to submit work that has been pulled from the Internet, nor work that has been done by your peers. Your submitted materials will be analyzed for plagiarism. 9 5 Grading Algorithm implementations 15 points, each Data file containing results 10 points Regression equations 10 points Analysis 20 points Requirements for each portion of the grade are described in Sections 2, 3.1, and 3.2. Appendix: Least-squares regression Input: set of k points P : {(x1, y1),(x2, y2), . . . ,(xk, yk)}, set of m functions F : {f1(x), f2(x), . . . , fm(x)} Output: Find a set of coefficients βˆ = (β1, β2, . . . , βm) T such that ˆy = β1f1(x) + β2f2(x) + . . . + βmfm(x) best approximates y. In other words, find βˆ such that ||y − yˆ||2 is minimized. Algorithm 1. Form the observation vector ~y = (y1, y2, . . . , yk) T as a column vector, based on the input. 2. Form the k × m design matrix X such that the entry in row i and column j of X is fj (xi), using the functions and data points given as input. In other words, each function fj should become a column of X, and each xi should become a row. 3. Calculate βˆ = (XTX) −1XT~y. This value is the orthogonal projection of ~y into the column space of X, so it will minimize the squared error ||~y − Xβˆ||2 . 4. The best-fit equation for ~y is ˆy = β1f1(x) + β2f2(x) + . . . + βmfm(x), where βi is the i th entry of βˆ. (The order of the coefficients in βˆ will match the order of the functions in the design matrix X.) 5. (Optional) Calculate the error term ||?||2 = ||Xβˆ − ~y||2 , where ||?||2 = ? · ?. This value is also known as the R2 (“R squared”) statistic and measures the goodness-of-fit. 6. (Optional) Plot the original data series (~x, ~y) against (~x, yˆ = Xβˆ) to examine the fit. 10

More products