Project 2: Hybrid Sorting

Starting from:

$35

Project 2: Hybrid Sorting

A sorting algorithm is an algorithm that puts elements in a certain order. Such algorithms are often used to organize an array or list in numerical or lexicographical order, however their use is not limited in scope to such simple orderings, or to such simple structures - a fact that will be demonstrated in this project.

Throughout the 20th century, as the domain of problems to which computers were applied grew, so to did the size of data sets that required sorting. This resulted in the rapid development of sorting algorithms. Clearly O(n2) algorithms, such an selection and bubble sort, were inferior to faster O(nlog(n)) algorithms, such as quick or merge sort, but by the 1970s even these weren't achieving speeds desired. This led to the development of ultra-optimized hybrid sorting algorithms, which tie together multiple sorting procedures recursively so as to apply the optimal sort to each chunk of data based on its size.

This project focuses on the Insertion Sort and the Merge Sort algorithms, and on a hybrid of the two. This hybridization is not merely a toy exercise - in fact, Python sorts its built in lists through a hybrid of merge and insertion sort.

Insertion Sort


Insertion sort is an in place comparison based sorting algorithm. It builds a final sorted array by comparing one element at a time and inserting it into it's appropriate location.

The worst case runtime is O(n^2). The best case runtime is linear - in the case that the array is already sorted. The space complexity is O(1) for an in place implementation.

Merge Sort


Merge sort is one of the most efficient sorting algorithms. It works on the principle of Divide and Conquer. Merge sort repeatedly breaks down a list into several sublists until each sublist consists of a single element and merges those sublists in a manner that results in a sorted list.

The runtime is O(nlog(n)) and Theta(nlog(n)). The space complexity is O(n) - new arrays are created everytime you divide. (Why isn't the space O(nlogn)?)

Hybrid Sort
While Merge Sort has a faster average runtime than Insertion Sort, there are instances that an Insertion Sort is more efficient. Due to the overhead of recursively splitting containers with a Merge Sort, Insertion Sort can have a faster performance with small sets of data. Thus, the two algorithms are often combined, with recursive calls to merge sort ceasing once the data is subdivided into small enough arrays.

Project Details
Overview
You will be implementing the Insertion Sort Algorithm and the Merge Sort Algorithm. You will develop your Merge Sort such that it can be used as a Hybrid Sort when given a threshold value. The Hybrid Sort will rely on Merge Sort until the partitioned lists are less than or equal to a given threshold, at which point you will switch to Insertion Sort.

These sorting algorithms will have an optional parameter that accepts a callable - an object that can accept its own arguments and can even return its own object. Callable is the alias used by python's typing module for a Function- in Python, functions are objects, and may be passed as arguments to other functions. For this project, you will be creating simple functions to pass in as callables, which should determine how your sorting algorithms will sort. The default callable argument to our functions is a lambda function that is intended to sort your objects in the usual increasing order. For the application problem, you may need to create more complex functions that cannot be written as a lambda.



You'll want to use the above example, comparator to compare each element in the list you are sorting . comparator will only return true if the first parameter passed in, x, is less than or equal to the second parameter passed in, y. To use the comparator shown in the image above in your code, you'd write

if comparator(x, y): # does the same thing as if x <= y in this case
In addition to these sorting algorithms, you will be implementing an algorithm to determine the inversion count of a list of elements. This algorithm will be integrated into your merge_sort function. You will only calculate the inversion count when your function is not being used as a Hybrid Sort, that is, when the threshold is 0.

The inversion count is how far away a list of elements is from being sorted. The inversion count of a sorted array is 0. You can think of the number of inversions as the number of pairs of elements that are out of order.

Two elements a[i] and a[j] form an inversion if a[i] > a[j] and i < j

Examples:
data = [3,2,9,0]

data has 4 inversions:(3, 2): 3 > 2 but 3 comes before 2
(3, 0): 3 > 0 but 3 comes before 0
(2, 0): 2 > 0 but 2 comes before 0
(9, 0): 9 > 0 but 9 comes before 0
data = [1, 2, 3, 4, 5]

data has 0 inversions
Note: Although you can swap the elements of the inversions to form a sorted array, the inversion count is not the same as the minimum number of swaps to sort the array.

For instance, data = [10, 1, 2, 0].

You could sort this in 1 swap (10, 0), but there are 5 inversions (10, 1), (10, 2), (10, 0), (1, 0), (2, 0).

Assignment Specs
You are given one file, HybridSort.py. You must complete and implement the following functions. Take note of the specified return values and input parameters. Do not change the function signatures.

You must adhere to the required time & space complexities.

HybridSort.py:

insertion_sort(data: List[T], comparator: Callable[[T,T], bool]) -> None:
Given a list of values and comparator, perform an insertion sort on the list using the comparator.
param data: List of items to be sorted
param comparator: A function, which takes an argument two objects of the same type as those in the list data and returns `True` if the first argument is less than or equal to the second according to some ordering, else `False`.
return: None
Time Complexity: O(n^2)
Space Complexity: O(1]
merge_sort(data, threshold: int = 0, comparator: Callable[[T,T], bool]) -> int:
Given a list of values, perform a merge sort to sort the list and calculate the inversion count. When a threshold is provided, use a merge sort algorithm until the partitioned lists are smaller than or equal to the threshold - then use insertion sort. Be sure to use the comparator.
param data: List of items to be sorted.
param threshold: int representing the size of the data at which insertion sort should be used. Defaults to 0.
param comparator: A function, which takes an argument two objects of like type to those in the list data and returns `True` if the first argument is less than or equal to the second according to some ordering, else `False`.
return: int representing inversion count, else 0 if threshold > 0.
NOTE: The inversion count will be calculated when only a merge_sort() algorithm is used! (when threshold is 0) return 0 otherwise.
Time Complexity: O(nlog(n))
Space Complexity: O(n)
hybrid_sort(data: List[T], threshold: int, comparator: Callable[[T,T], boo]) -> None:
Wrapper function to call merge_sort() as a Hybrid Sorting Algorithm. Should call merge_sort() with provided threshold, and comparator function.
param data: List of items to be sorted.
param threshold: int representing the size of the data at which insertion sort should be used.
param comparator: A function, which takes an argument two objects of like type to those in the list data and returns `True` if the first argument is less than or equal to the second according to some ordering, else `False`.
return: None
NOTE: If this is more than one line, you're probably doing something wrong.
Time Complexity: O(nlog(n))
Space Complexity: O(n)
inversions_count(data: List[T]) -> int:
Wrapper function to call merge_sort() on a copy of data to retrieve the inversion count. Should call merge_sort() with no threshold, and the default comparator.
This function should copy the original data - we want to determine the number of inversions in its current ordering, not necessarily sort it.
param data: List of items to determine the inversion count of.
return: int representing inversion count.
Time Complexity: O(nlog(n))
Space Complexity: O(n)
reverse_sort(data: List[T], threshold: int) -> None:
Wrapper function to use merge_sort() to sort the data in reverse. Should call merge_sort() with provided threshold, and a comparator you define.
param data: List of items to be sorted.
param threshold: int represnting the size of the data at which insertion sort should be used.
return: None
Time Complexity: O(nlog(n))
Space Complexity: O(n)
Application: Navigation Troubles

Congratulations, you've been hired by the United States Navy to diagnose some serious problems in their fleet's sonar technology! It turns out that the sonar employed by Navy ships is unable to accurately perceive distance and makes frequent mistakes; often it perceives an object to be further away than it truly is, or closer.

This causes a conundrum. The Navy cannot retire its entire fleet, or attempt to repair it all at once -- the resources to do so simply do not exist. Instead, your colleagues have devised a test to determine which ships are in most urgent need of repair and which can continue service in their current state.

The entire navy fleet has been assembled in an empty patch of the Pacific Ocean and arranged in a way such that each ship can be located via satellite and assigned exact coordinates relative to one another. Each ship sends out a sonar pulse, locating all the other navy ships. If this ship's sonar was functioning correctly, it would first perceive the ship closest to it, and then second closest, and so on. Instead, it perceives nearby ships in seemingly random order (regardless of their physical coordinates). If a further ship is perceived before a closer ship, it is considered a mistake. Your job is to rank ships based on which make the least mistakes. IF TWO SHIPS MAKE THE SAME NUMBER OF MISTAKES, SORT THOSE SHIPS IN ALPHABETICAL ORDER BASED ON THEIR NAME.

Let's look at an example...

Here we see four ships plotted, the USS Ian Barber, the USS Andrew, the USS Bank, and the UHH H. These are coordinates (-1,3), (-2,0), (0,6), and (-4,-4) respectively.

Let's suppose that while using its sonar, the UHH H perceives the other ships in this order: USS Bank, USS Andrew, and then the USS Ian Barber. Clearly, mistakes have been made.

First, let's look at the distances, as defined by the standard euclidean metric:

D(H, Andrew) = sqrt( (x_1 - x_2)^2 + (y_1 - y_2)^2 ) = sqrt( (-4 - -2)^2 + (-4 - 0)^2 ) = sqrt(20) = 4.47...

D(H, Ian) = 7.62...

D(H, Bank) = 10.77...

The correct order should have been the USS Andrew, USS Ian Barber, and then the USS Bank.

The USS Bank was perceived before both the USS Andrew and the USS Ian. This constitutes two mistakes because we have to swap the USS Bank with the USS Ian Barber and then the USS Bank with the USS Andrew afterward to fix this.

The USS Andrew and the USS Ian were perceived in correct order relative to one another (the USS Andrew came before the USS Ian as it should have), so this doesn't contribute any other mistakes to the count.

Continue this process for each ship, determining the number of mistakes made. From this, you will sort ships by least to most mistakes made.

Suppose the UHH H and the USS Ian both had equally faulty sonar, and thus both made two mistakes. Now, suppose the USS Andrew made only one mistake, and the USS Bank made zero mistakes.

In this case, we would first order based on mistakes made, giving a result of:

[Bank, Andrew, Ian, H]

However, since alphabetically the UHH H comes before the USS Ian, our final list would be:

[Bank, Andrew, H, Ian].

class Ship:
DO NOT MODIFY the following attributes/functions

Attributes:x: int: The x (longitudinal) coordinate of the ship
y: int: The y (latitudinal) coordinate of the ship
name: String: The name of the ship
__init__(self, name: str, x: int, y: int) -> None:Constructs a ship object
x: int: The x (longitudinal) coordinate of the ship
y: int: The y (latitudinal) coordinate of the ship
name: String: The name of the ship
__str__(self) -> str and __repr__(self) -> str:
represent ships as strings, for the sake of debugging
__eq__(self, other: Ship) -> bool:returns True iff the two ships are equivalent (same data members)
__hash__(self) -> int:Used for hashing Ship objects for use as keys in dictionaries
euclidean_distance(self, other: Ship) -> float:Computes the euclidean distance (via the formula shown) between two ships
other: Ship: The other ship involved in the computation
taxicab_distance(self, other: Ship) -> float:Computes the taxicab distance (via the formula shown) between two ships
other: Ship: The other ship involved in the computation
Implement the following function:

navigation_test(ships: Dict[Ship, List[Ship]], euclidean: bool) -> List[Ship]:

This function takes in a dictionary, which maps each ship to a list of other ships in the order in which they are perceived via sonar, and a boolean indicating which distance metric to use.
ships: Dict[Ship, List[Ship]]: A dictionary, where each key is a ship object, and each key is a list of other ships.
euclidean: bool: This boolean indicator tells you which distance metric to use. If True, use euclidean_distance to determine the number of sonar mistakes made, else use taxicab_distance.
Returns: A python list of ship objects, ranked in order of the number of mistakes made, from least to most
Time Complexity: O(n^2*log(n)) - where `n` is the number of ships
Space Complexity: O(n)

Submission
Deliverables
Be sure to upload the following deliverables in a .zip folder to Mimir by 8:00p Eastern Time on Thursday, 10/21/21.

Your .zip folder can contain other files (for example, description.md and tests.py), but must include (at least) the following:

|- Project2.zip
|— Project2/
|— README.xml (for project feedback)
|— __init__.py (for proper Mimir testcase loading)
|— HybridSort.py (contains your solution source code)

Grading
The following 100-point rubric will be used to determine your grade on Project2:

NOTE THAT THE SPEED TESTS ARE WORTH ZERO POINTS, AS THEY ARE ONLY THERE TO HELP YOU CHECK IF YOU ARE OVER TIME COMPLEXITY.

Tests (70)00 - Coding Standard: __/5
01 - Insertion Sort Basic: __/4
02 - Insertion Sort Comparator: __/4
03 - Insertion Sort Comprehensive: __/5
04 - Merge Sort Basic: __/2
05 - Merge Sort Threshold: __/3
06 - Merge Sort Comparator: __/4
07 - Merge Sort Comprehensive: __/5
08 - Hybrid Sort Basic: __/2
09 - Hybrid Sort Comprehensive: __/5
10 - Hybrid Sort Speed: __/0
11 - Reverse Sort Basic: __/3
12 - Reverse Sort Speed: __/0
13 - Inversion Count: __/10
14 - Application Small: __/5
15 - Application Large: __/10
16 - README.xml validity: __/3
Manual (30)Time and space complexity points are all-or-nothing for each function. If you fail to meet time or space complexity in a given function, you do not receive manual points for that function.
Insertion Sort Complexities: __/4
Merge Sort Complexities: __/8
Hybrid Sort Complexities: __/4
Reverse Sort Complexities: __/4
Application Complexities: __/10
Tips, Tricks & Notes
You must fill out function docstrings to receive Coding Standard points.
You may not use Python's built in sort, or any imported sorting methods
There are different ways to implement merge sort, but make sure you are aiming for a solution that will fit the time complexity! If your recursive calls are some form of merge_sort(start + 1: end), this will not be O(nlog(n)).
Your wrapper functions (hybrid_sort(), inversions_count(), reverse_sort()) should be simple functions containing one call to merge_sort(). This may seem useless, but see here for the significance of wrapper functions.
Test cases will only test the functions specified.Note: The Merge Sort testcases will not test for inversion count
Try these web applications to visualize sorting algorithms:https://visualgo.net/bn/sorting (includes inversion count - "inversion index")
https://opendsa-server.cs.vt.edu/embed/mergesortAV (good merge sort visualization)
https://www.cs.usfca.edu/~galles/visualization/ComparisonSort.html
The file plot.py has been provided for comparing your sorting function's runtime.Run this file to see a graphical representation of your functions' performances.
You may need to install matplotlib and numpy.If you are not familiar with the terminal instructions are here: https://www.jetbrains.com/help/pycharm/installing-uninstalling-and-upgrading-packages.html
Otherwise use these commands:pip install matplotlib
pip install numpy
You do not have to use this.
You should already have a function handy to help you count sonar mistakes in the application problem.
This project was created by Andrew Haas, Elizabeth Deback, and Adam Kasumovic.

More products

Unix systems calls

$10

Add to cart

standard I/O library functions