Starting from:

$30

HW #2. Improve code Efficiency: Sort First!

5012 HW #2. Improve code Efficiency: Sort
First!
Scenario.
In a two class, classification problem, it is common to use a classifier that outputs
confidences (rather than simply class labels). A good example of this is a Support Vector
Machine. A pro for using such a classifier is that you gain more information -- specifically
the confidence in the classification result. A con is that in order to make a final classification
decision, a threshold value must be determined. For example, if a threshold of 0.75 is
chosen, the class label 1 would be assigned for confidences greater than 0.75 and for
confidences less than 0.75 a class label of 0 would be assigned. However, this begs the
question: how is the threshold chosen?
Many data scientists will choose a threshold based on the experimental results and/or
operational constraints. In this code example, we assume that we have confidences and true
labels for a large data set. To determine a good threshold we will compute the true positive
rates (TPRs) and false positive rates (FPRs) at all relevant thresholds. The relevant thresholds
are considered those that would change the TPRs and FPRs.
In the code below, a function is defined to compute the TPR and FPR at all relevant
thresholds. However, the code is not very efficient and can be improved. (Note there are tips
and hints found in the comments.)
Your task is the following:
Question 1
40 POINTS
Assess the time complexity of the method computeAllTPRs(...). Provide a line-by-line
assessment in comments identifying the proportional number of steps (bounding notation
is sufficient) per line: eg, O(1), O(log n), O(n), etc. Also, derive a time step function T(n) for
the entire method (where n is the size of input true_label).
Question 2
30 POINTS
Implement a new function computeAllTPRs_improved(...) which performs the same task as
computeAllTPRs but has a significantly reduced time complexity. Also provide a line-by-line
assessment in comments identifying the proportional number of steps per line, and derive a
time step function T(n) for the entire method (where n is the size of input true_label).
Question 3
22. 9. 8. 오후 10:30 5012_HW2_sortFirst
file:///C:/Users/peren/Downloads/5012_HW2_sortFirst.html 2/6
30 POINTS
Compare the theoretical time complexities of both methods and predict which is more
efficient. Next, test your prediction by timing both methods on sample inputs of varying
sizes. Create a plot of inputSize vs runtime (as done in similar class examples).
NOTE: Do not include runtimes for graphing
TOTAL POINTS: 100 

More products