Starting from:

$35

Project 6: Hash Tables

Project 6: Hash Tables


Assignment Overview
Hash Tables are a very powerful data structure that are best known for their ability to insert, delete, and lookup in O(1) time. This allows them to be very powerful in storing data that needs to be accessed quickly. Other data structures we have explored, such as Linked Lists (O(n) lookup and deletion) and AVL Trees (log(n) lookup, insertion, and deletion) lack that O(1) ability accross the board. 

 

A lot of you may already be familiar with the concept of hash tables, as they are implemented in Python as dictionaries and C++ as unordered maps.

In this project, you will be implementing a Hash Table from scratch in Python and applying it to an application problem.

Assignment Notes
The use of a Python dictionary results in a grade of 0 for the project! 
In addition, the only python container/collection type you can use is the built in list class (no sets, linked lists, queues, etc.)
We are going to have you use many of pythons built in "magic methods" in this project. A magic method is one that has the two underscores on the front and the back, such as __len__. In this project, these "magic methods" won't be doing much, they will call the other protected methods that you write!
So, what are "protected methods"? Protected methods are methods prefaced with a single underscore, such as a function called "_insert". Protected methods are meant to only be called inside other functions in the class. This is Pythons way of implementing the C++ equivilant of "public" and "private" - protected methods meant to be treated as private!
Building on the above point, all attributes/functions that are protected (that is, leading with an underscore) should not be called outside of your class, which means they should not be accessed in your application problem!
Use of _insert(), _delete(), _get(), and _grow() is STRICTLY FORBIDDEN in the application!!!
We have very small test cases for the _insert(), _get(), and _delete() functions. The purpose is to make sure you split the work between the magic and hidden methods appropriately. The majority of the testing will take place in the magic method implementations!
A few guarentees:Capacity will not grow past ~1000
All keys will be of type string
Here is an table that shows how private methods and magic methods relate to each other:

 

Assignment Specifications
class HashNode:
DO NOT MODIFY the following attributes/functions

Attributeskey: str: The key of the hash node (this is what is used in hashing)
value: T: Value being held in the node. Note that this may be any type, such as a str, int, float, dict, or a more complex object.
deleted: bool: Whether or not the node has been deleted.
__init__(self, key: str, value: T, deleted: bool = False) -> NoneConstructs a hash node.
key: str: The key of the hash node.
value: T: Value being held in the node.
deleted: bool: Whether or not the node has been deleted. Defaults to false.
Returns: None.
__str__(self) -> str and __repr__(self) -> strRepresents the Node as a string.
Returns: str representation of node
__eq__(self, other: HashNode) -> boolCompares to see if two hash nodes are equal
other: HashNode: The HashNode we are comparing against
Returns: bool stating whether or not they are equal
__iadd__(self, other: T) -> boolAdds to the value of the current HashNode
other: T: The value we are adding to our current value
Returns: None
class HashTable:
DO NOT MODIFY the following attributes/functions

Attributescapacity: int: Capacity of the hash table.
size: int: Current number of nodes in the hash table.
table: List: This is where the actual data for our hash table is stored
prime_index: int: Current index of the prime numbers we are using in _hash_2()
primesThis is a list of all the prime numbers, from 2 until 1000, used for _hash_2(). This is a class attribute, so it is accesed by HashTable.primes, NOT self.primes()!
__init__(self, capacity: int = 8) -> NoneConstruct an empty hash table, with the capacity as specified in the input
capacity: int: 
Returns: None.
__str__(self) -> str and __repr__(self) -> strRepresents the HashTable as a string.
Returns: str.
__eq__(self, other: HashTable) -> boolChecks if two HashTables are equal
other: HashTable: the hashtable we are comparing against
Returns: bool stating whether or not they are equal
_hash_1(self, key: str) -> intThe first of the two hash functions used to turn a key into a bin number
key: str: key we are hashing
Returns: int that is the bin number
_hash_2(self, key: str) -> intThe second of the two hash functions used to turn a key into a bin number. This hash function acts as the tie breaker.
key: str: key we are hashing
Returns: int that is the bin number
IMPLEMENT the following functions

__len__(self) -> intGetter for the size (that, is, the number of elements) in the HashTable
Time Complexity: O(1)
Space Complexity: O(1)
Returns: int that is size of hash table
__setitem__(self, key: str, value: T) -> NoneSets the value with an associated key in the HashTableThis should be a short, ~1 line function - the majority of the work should be done in the _insert() method!
Time Complexity: O(1)*
Space Complexity: O(1)*
key: str: The key we are hashing
value: T: The associated value we are storing
Returns: None
__getitem__(self, key: str) -> TLooks up the value with an associated key in the HashTableIf the key does not exist in the table, raise a KeyError 
This should be a short, ~3 line function - the majority of the work should be done in the _get() method!
Time Complexity: O(1)*
Space Complexity: O(1)
key: str: The key we are seraching for the associated value of
Returns: The value with an associated Key
__delitem__(self, key: str) -> NoneDeletes the value with an associated key in the HashTableIf the key does not exist in the table, raise a KeyError 
This should be a short, ~3 line function - the majority of the work should be done in the _get() and _delete() methods!
Time Complexity: O(1)*
Space Complexity: O(1)
key: str: The key we are deleting the associated value of
Returns: None
__contains__(self, key: str) -> boolDetermines if a node with the key denoted by the parameter exists in the tableThis should be a short, ~3 line function - the majority of the work should be done in the _get() method!
Time Complexity: O(1)*
Space Complexity: O(1)
key: str: The key we are checking to be a part of the hash table
Returns: None
hash(self, key: str, inserting: bool = False) -> int
Given a key string return an index in the hash table.
Should implement double hashing.
If the key exists in the hash table, return the index of the existing HashNode
If the key does not exist in the hash table, return the index of the next available empty position in the hash table.Collision resolution should implement double hashing with hash1 as the initial hash and hash2 as the step size
Note - There are 2 possibilities when hashing for an index:When inserting a node into the hash table we want to insert into the next available bin.

When performing a lookup/deletion in the hash table we want to continue until we either find the proper HashNode or until we reach a bin that has never held a value. This is to preserve the collison resolution methodology.
The inserting parameter should be used to differentiate between these two cases.
Time Complexity: O(1)*
Space Complexity: O(1)
key: str: The key being used in our hash function
inserting: bool: Whether or not we are doing an insertion. Important for the reasons described above.
Returns: int that is the bin we hashed into
_insert(self, key: str, value: T) -> NoneUse the key and value parameters to add a HashNode to the hash table.
If the key exists, overwrite the existing value
In the event that inserting causes the table to have a load factor of 0.5 or greater you must grow the table to double the existing capacity.
Time Complexity: O(1)*
Space Complexity: O(1)*
key: str: The key associated with the value we are storing
value: T: The associated value we are storing
Returns: None
_get(self, key: str) -> HashNodeFind the HashNode with the given key in the hash table.
If the element does not exist, return None
Time Complexity: O(1)*
Space Complexity: O(1)
key: str: The key we looking up
Returns: HashNode with the key we looked up
_delete(self, key: str) -> NoneRemoves the HashNode with the given key from the hash table .If the node is found assign its key and value to None, and set the deleted flag to True
Time Complexity: O(1)*
Space Complexity: O(1)
key: str: The key of the Node we are looking to delete
Returns: None
_grow(self) -> NoneDouble the capacity of the existing hash table.Do NOT rehash deleted HashNodes
Must update self.prime_index, the value of self.prime_index should be the index of the largest prime smaller than self.capacity in the HashTable.primes tuple.

Time Complexity: O(N)
Space Complexity: O(N)
Returns: None
update(self, pairs: List[Tuple[str, T]] = []) -> NoneUpdates the hash table using an iterable of key value pairsIf the value already exists, update it, otherwise enter it into the table

Time Complexity: O(M)*, where M is length of pairs
Space Complexity: O(M)
pairs: List[Tuple[str, T]]: list of tuples (key, value) being updated
Returns: None
keys(self) -> List[str]Makes a list that contains all of the keys in the tableOrder does not matter!
Time Complexity: O(N)*
Space Complexity: O(N)
Returns: List of the keys
values(self) -> List[T]Makes a list that contains all of the values in the tableOrder does not matter!
Time Complexity: O(N)*
Space Complexity: O(N)
Returns: List of the values
items(self) -> List[Tuple[str,T]]Makes a list that contains all of the keys in the tableOrder does not matter!
Time Complexity: O(N)*
Space Complexity: O(N)
Returns: List of Tuples of the form (key, value)
clear(self) -> NoneShould clear the table of HashNodes completely, in essence a reset of the tableShould not modify capacity
Notice the O(1) space complexity - this must be done in place!
Time Complexity: O(N)
Space Complexity: O(1)
Returns: None
*Amortized Complexity

Application: CATA Bus Data
The CATA schedulers have noticed that the busses are becoming less efficient.  To combat this problem, they've tasked you with tracking users in the CATA system and their commute times.  CATA would like to be able to log the enter and exit times at bus stops around campus and get the average travel time between any two stops.

Your job is to create a data structure that can be queried to get the average travel time between two stations. 

Your class will be instantiated with the following call:

cata_data = CataData()

Example 1

cata_data.enter("Ian", "Wilson", 1)
cata_data.enter("Max", "Wilson", 1)
cata_data.exit("Ian", "Akers", 4)
cata_data.exit("Max", "Akers", 6)
 
After this series of function calls, querying the CataData object with cata_data.get_average("Wilson", "Akers") should return 4.  There are two trips from Wilson to Akers in the system.  It took Ian a time of 3 and it took Max a time of 5.  (3 + 5) / 2 = 4
 
Example 2
 
cata_data.enter("Ian", "Engineering", 0)
cata_data.enter("Max", "Chemistry", 7)
cata_data.exit("Ian", "Chemistry", 1)
cata_data.enter("Ian", "Chemistry", 4)
cata_data.exit("Ian", "Wells", 6)
cata_data.enter("Ian", "Wells", 8)
cata_data.exit("Ian", "Wilson", 10)
cata_data.exit("Max", "Wells", 12)
 
cata_data.get_average("Chemistry", "Wells") = 3.5
cata_data.get_average("Engineering", "Chemistry") = 1
 
Notes:
If an id that was never in the system exits the system, nothing should be tracked (think of this as people who sneak onto the bus.  There is no scan in, but they are scanned out.  Since there is no record of their origin, we cannot obtain any meaningful data from this)
If the system is queried for two stations for which there isn't a trip yet, the get_average function should return 0.0
If an id that is currently in the system enters again, use the new origin station (think of this as a miss of the user's last exit scan)
 
class CataData: 

__init__(self):Design your data structure here
enter(self, id, origin, time) -> NoneNotes that a rider, identified by "id", is on a bus.  This rider is starting from station "origin" at a time of "time"
Time complexity: O(1)
Return: None
exit(self, id, dest, time) -> None
Notes that a rider, identified by "id", is no longer on a bus.  This rider has gotten off at station "dest" at a time of "time"
Time complexity: O(1)
Return: None
get_average(self, origin, dest) -> float
Gets the average travel time of users riding CATA busses from origin to dest.
Time complexity: O(1)
Return: None
Every operation must be in constant time: O(1)
The space of the data structure must be: O(n^2) where n is the amount of bus stops in the system.
You must use at least one HashTable(), but you can use as many as you want provided it's O(n^2) space.
REMEMBER THAT HASHTABLE VALUES CAN BE ANY TYPE
Use of _insert(), _delete(), _get(), and _grow() is STRICTLY FORBIDDEN in the application!!!

Submission
Deliverables
Be sure to upload the following deliverables in a .zip folder to Mimir by 11:59p ET on Friday, March 19th.

Project6.zip
    |— Project6/
        |— README.xml      (for project feedback)
        |— __init__.py     (for proper Mimir testcase loading)
        |— hashtable.py    (contains your solution source code)

Grading
Tests (60)Coding Standard: __/2
hashtable: __/43hash: __/7
_insert: __/1
_get: __/1
_delete: __/1
__len__: __/1
__setitem__: __/4
__getitem__: __/4
__delitem__: __/4
__contains__: __/2
update: __/3
keys/values/items: __/5
clear: __/2
comprehensive: __/8
CataData: __/15application: __/5
application larger: __/10
Manual (40)README.xml is completely filled out with (1) NetID, (2) Feedback, (3) Time to Completion and (4) Citations: __/2
hashtable time/space: __/30hash: __/4
__len__: __/1
__setitem__: __/4
__getitem__: __/4
__delitem__: __/4
__contains__: __/3
_grow: __/4
update: __/2
keys/values/items: __/2
clear: __/2
CataData time/space: __/8
Appendix

More products