Starting from:

$35

Project 1 - Supervised Verification of Movies


EECS 445 Introduction to Machine Learning

Project 1 - Supervised Verification of Movies

1 Introduction
Your favourite GSI, Junghwan, loves watching movies. In fact, he watches one every evening before going to
bed. Unfortunately, being a very critical movie connoisseur, he spends hours reading reviews before making
his choice, so he tends to fall asleep before the end of the movie. To rub more salt into the wound, he forgot
to cancel his Amazon Prime student subscription last month, so now he has no choice but to watch all the
interesting movies there. Thankfully, Junghwan now has an army of EECS 445 students at his disposal, who
have become well-versed in solving complex supervised Machine Learning problems. He plans to solve the
task of finding the most critically-acclaimed movies by training a model to deduce the sentiment of their
Amazon reviews (i.e. determine if the review author thinks the movie is worth watching).
In this project, we have given you review data from Amazon Video’s large catalogue of movies. The Amazon
dataset contains thousands of reviews and ratings from different users. You will work with this dataset to
train various Support Vector Machines (SVMs) to classify the sentiment of a review. That way Junghwan
can automate the process of choosing and will be able to finish watching his movie before falling sleep. In
this process, we will also explore some very useful scikit-learn packages and data science techniques.
1.1 Requirements:
1. Updated version of Anaconda (https://docs.continuum.io/anaconda/install/), with
a Python 3.7 environment.
2. Updated version of scikit-learn (0.21.2) in your Anaconda install: https://scikit-learn.
org/stable/index.html
3. Updated version of numpy (1.17) in your Anaconda install: http://www.numpy.org/
4. Updated version of pandas (0.25.1) in your Anaconda install: https://pandas.pydata.
org/
5. Updated version of matplotlib (3.1.1) in your Anaconda install: https://matplotlib.
org/
• You can verify your version of Anaconda-managed packages by running conda list
• If needed, you can update a package by running conda update [package name]
• You may create an Anaconda environment for the project with only the packages you will need,
or you may simply use the Anaconda instance of Python which will make packages managed by
Anaconda available for your use. Please reference the Anaconda documentation.
1
1.2 Getting Started
To get started, download Project1 from Canvas. It should contain the following files:
• data/dataset.csv
• data/heldout.csv
• data/imbalanced.csv
• project1.py
• helper.py
• test output.py
The files dataset.csv and imbalanced.csv have reviews from Amazon. These csv files have 6
columns: reviewText, summary, unixReviewTime, reviewTime, rating, label. Each row in the csv file corresponds to one review. The reviewText column contains the text of the actual review. The label column is a
multiclass label: 1 if positive (4 or 5 star rating), 0 if neutral (3 stars), and −1 if negative (1 or 2 stars).
You will use the reviewText and label columns for most of the project (we will ignore the 0 label reviews
in order to make the label binary). The final challenge portion, however, will utilize all -1, 0, and 1 labeled
reviews.
The helper file helper.py provides functions that allow you to read in the data from csv files. The file
project1.py contains skeleton code for the project, along with the helper function select classifier
which you may implement to return SVM classifiers depending on the given input parameters. The file
test output.py allows you to test your output csv file before submission to make sure the format is
correct.
The data for each part of the project has already been read in for you in the main function of the
skeleton code. Please do not change how the data is read in; doing so may affect your results.
The skeleton code project1.py provides specifications for functions that you will implement:
• extract dictionary(df)
• generate feature matrix(df, word dict)
• cv performance(clf, X, y, k=5, metric=‘accuracy’)
• select param linear(X, y, k=5, metric=‘accuracy’, C range=[], penalty=‘l2’)
• plot weight(X, y, penalty, metric, C range)
• select param quadratic(X, y, k=5, metric=‘accuracy’, param range = [])
• Optional: select classifier(penalty=‘l2’, c=1.0, degree=1, r=0.0,
class weight=‘balanced’)
• Optional: performance(y true, y pred, metric=‘accuracy’)
2
2 Feature Extraction [20 pts]
Given a dictionary containing d unique words, we can transform the n variable-length reviews into n feature
vectors of length d, by setting the i
th element of the j
th feature vector to 1 if the i
th word is in the j
th review,
and 0 otherwise. Given that the four words {‘the’:0, ‘movie’:1, ‘was’:2, ‘best’:3} are
the only four words we ever encounter, the review “BEST movie ever!!” would map to the feature vector
[0, 1, 0, 1].
Note that we do not consider case. Also, note that since the word “ever” was not in the original dictionary,
it is ignored as a feature. In real-world scenarios, there may be words in test data that you do not encounter
in training data. There are many interesting methods for dealing with this that you may explore in part 5.
(a) Start by implementing the extract dictionary(df) function. You will use this function to read
all unique words contained in dataset.csv into a dictionary (as in the example above). You can start
implementing this function by removing all the punctuation in the dataset. While removing punctuation,
please make sure that you do not accidentally combine two different words that are separated by a
punctuation mark. For instance, after you remove punctuation from “Movie was awesome!Yay”, you
should produce “Movie was awesome Yay”, not “Movie was awesomeYay”. After removing all the
punctuation, you should convert all the words to lowercase and start building your dictionary. Your
function should return a dictionary of d unique words.
Note: You will need to report the number of unique words in 2(c).
Hint: You might find string.punctuation along with the method string.replace() useful.
(b) Next, implement the generate feature matrix(df, word dict) function. For each review
j, construct a feature vector of length d, where the i
th entry in the feature vector is 1 if the i
th word
in the dictionary is present in review j, or 0 otherwise. Assuming that there are n reviews total, return
the feature vectors as an (n, d) feature matrix, where each row represents a review, and each column
represents whether or not a specific word appeared in that review.
(c) The function get split binary data in helper.py uses the functions you implemented in (a)
and (b). Examine how it is implemented. Then, use get split binary data to get the training
feature matrix X train.
Note the class size parameter in get split binary data. If at any point you are not confident
that your algorithm is working as intended, it might be worth reducing the class size to try out running
on a smaller input size. Otherwise, your algorithm may take a few minutes to terminate, only to find out
that it is not working properly. However, for all exercises that ask for results, please use the default
parameters.
In your write-up, include the following:
• The value of d which you recorded after extracting the training data (the number of unique words).
You should be able to extract d from the size of the training feature matrix.
• The average number of non-zero features per rating in the training data. You will need to calculate
this on X train.
3
3 Hyperparameter and Model Selection [40 pts]
In question 2, you have implemented functions that transform the reviews into a feature matrix X train
and a label vector y train. Test data X test, y test has also been read in for you. You will use these
data for all of question 3. You may notice that X train, y train only have 1000 reviews, while the
dataset.csv file has 4500 reviews. Here, we only give you a subset of the data to train on. You will
choose the number of reviews you want to work with in question 5.
We will learn a classifier to separate the training data into positive and non-positive (i.e., “negative”) labels.
The labels in y train are transformed into binary labels in {−1, 1}, where −1 means “poor” and 1 means
“good.” This is a binary classification problem, which you know how to solve!
For the classifier, we will use SVMs with two different kernels: linear and quadratic. In parts 3.1-3.3 we
will make use of the sklearn.svm.SVC class. At first, we will explicitly set only two of the initialization
parameters of SVC(): the kernel, and C. In addition, we will use the following methods in the SVC class:
fit(X,y), predict(X) and decision function(X) – please use predict(X) when measuring for any performance metric that is not AUROC and decision function(X) for AUROC (see the
documentation for more details).
As discussed in lecture, SVMs have hyperparameters that must be set by the user. For both linear-kernel and
quadratic-kernel SVMs, we will select hyperparameters using 5-fold cross-validation (CV) on the training
data. We will select the hyperparameters that lead to the ‘best’ mean performance across all five folds. The
result of hyperparameter selection often depends upon the choice of performance measure. Here, we will
consider the following performance measures: Accuracy, F1-Score, AUROC, Precision, Sensitivity, and
Specificity.
Note: When calculating the F1-score, it is possible that a divide-by-zero may occur which throws a warning.
Consider how this metric is calculated, perhaps by reviewing the relevant scikit-learn documentation.
Some of these measures are already implemented as functions in the sklearn.metrics submodule.
Please use sklearn.metrics.roc auc score for AUROC. You can use the values from
sklearn.metrics.confusion matrix to calculate the others (Note – the confusion matrix is just
the table of Predicted vs. Actual label counts, that is, the True Positive, False Positive, True Negative, and
False Negative counts for binary classification). Make sure to read the documentation carefully, as when
calling this function you will want to set labels=[1,-1] for a deterministic ordering of your confusion
matrix output.
3.1 Hyperparameter Selection for a Linear-Kernel SVM [20 pts]
(a) To begin, implement the function cv performance(clf, X, y, k=5,metric=‘accuracy’)
as defined in the skeleton code. Here you will make use of the fit(X,y), predict(X), and
decision function(X) methods in the SVC class. The function returns the mean k-fold CV
performance for the performance metric passed into the function. The default metric is ‘accuracy’,
however your function should work for all of the metrics listed above. It may be useful to have a helper
function that calculates each performance metric. For instance: performance(y true, y pred,
metric=‘accuracy’)
You may have noticed that the proportion of the two classes (positive and non-positive) are equal in the
training data. When dividing the data into folds for CV, you should try to keep the class proportions
4
roughly the same across folds; in this case, the class proportions should be roughly equal across folds,
since the original training data has equal class proportions.
You must implement this function without using the scikit learn implementation of CV. You will
need to employ the following class for splitting the data: sklearn.model selection.StratifiedKFold().
Do not shuffle points when using this function (i.e., do not set shuffle=True). This is so the generated folds are consistent for the same dataset across runs of the entire program.
In your write-up, briefly describe why it might be beneficial to maintain class proportions across folds.
(b) Now implement the select param linear(X, y, k=5, metric=‘accuracy’, C range
= [], penalty=’l2’) function to choose a value of C for a linear SVM based on the training data
and the specified metric. Note that scikit-learn uses a slightly different formulation of SVM from the
one we introduced in lecture, namely:
minimize
θ,b,ξ ¯
i
||¯θ||2
2
+ C
Xn
i=1
ξi
subject to y
(i)
(
¯θ · x¯
(i) + b) ≥ 1 − ξi
ξi ≥ 0, ∀i = 1, 2, ..., n
Essentially, the C is inversely proportional to the λ we used in lecture. Your function should call
your CV function (implemented earlier) passing in instances of SVC(kernel=‘linear’, C=c,
class weight=‘balanced’) with a range of values for C chosen in powers of 10 between 10−3
and 103
(i.e. 10−3
, 10−2
, . . . , 102
, 103
). You may choose to implement and use the helper function
select classifier to instantiate the needed classifier.
(c) Finally, using the training data from question 2 and the functions implemented here, find the best setting
for C for each performance measure (if there is a tie, choose the smaller C value). Report your findings
in tabular format with three columns: names of the performance measures, along with the corresponding values of C and the mean cross-validation performance. The table should follow the format given
below:
Performance Measures C Performance
Accuracy
F1-Score
AUROC
Precision
Sensitivity
Specificity
Your select param linear function returns the ‘best’ value of C given a range of values. Note:
as we are working with a fairly large feature matrix, this may take several minutes (our project solution
time is about 32 minutes for this question on our test computer).
Also, in your write-up, describe how the 5-fold CV performance varies with C. If you have to train
a final model, which performance measure would you optimize for when choosing C? Explain your
choice. This performance measure will be used in part d.
5
(d) Now, using the value of C that maximizes your chosen performance measure, create an SVM as in the
previous question. Again, you may choose to use the helper function select classifier. Train
this SVM on the training data X train, y train. Report the performance of this SVM on the test
data X test, y test for each metric below.
Performance Measures Performance
Accuracy
F1-Score
AUROC
Precision
Sensitvity
Specificity
(e) Finish the implementation of the plot weight(X, y, penalty, metric, C range) function. In this function, you need to find the L0-norm of ¯θ, the parameter vector learned by the SVM, for
each value of C in the given range. Finding out how to get the vector ¯θ from a SVC object may require
you to dig into the documentation. The L0-norm is given as follows, for ¯θ ∈ R
d
:
k
¯θk0 =
X
d
i=1
I{θi 6= 0}
where I{a 6= 0} is 0 if a is 0 and 1 otherwise.
Use the complete training data X train, Y train, i.e, don’t use cross-validation for this part. Once
you implement the function, the existing code will plot L0-norm k
¯θk0 against C and save it to a file. In
your write-up, include the produced plot and describe any interesting trends you observe.
(f) Recall that each coefficient of ¯θ is associated with a word. The more positive a coefficient is, the more
the presence of the associated word indicates that the review is positive, and similarly with negative
coefficients. In this way, we can use these coefficients to find out what word-rating associations our
SVM has learned.
Using C = 0.1 (for consistency with our results), train an SVM on X train, Y train. On this
trained SVM, find the top 4 most positive coefficients and the top 4 most negative coefficients of ¯θ.
Using the dictionary created on the training data dictionary binary, find the words that these
coefficients correspond to, and report both the coefficients and the corresponding words. As before, you
may choose to use the helper function select classifier.
Positive Coefficient Word
Negative Coefficient Word
6
3.2 Hyperparameter Selection for a Quadratic-Kernel SVM [10 pts]
Similar to the hyperparameter selection for a linear-kernel SVM, you will perform hyperparameter selection for a quadratic-kernel SVM. Here we are assuming a kernel of the form (¯x · x¯
0 + r)
2
, where r is a
hyperparameter.
(a) Implement the select param quadratic(X, y, k=5, metric=‘accuracy’, param range
= []) function to choose a setting for C and r as in the previous part. Your function should call your
CV function (implemented earlier) passing in instances of SVC(kernel=‘poly’, degree=2,
C=c, coef0=r, class weight=‘balanced’) with the same range of C that we use in 3.1(b).
You should also use the same range for r.
Again, you may choose to use the helper function select classifier. The function argument
param range should be a matrix with two columns with first column as values of C and second
column as values of r. You will need to try out the range of parameters via two methods:
i) Grid Search: In this case, we look at all the specified values of C in a given set and choose the best
setting after tuning. For this question, the values should be considered in powers of 10 for both C
(between 10−3
and 103
) and r (between 10−3
and 103
) [A total of 49 pairs]. This code will take a
substantial time to run (our project solution runs in about 50 minutes on our test computer).
ii) Random Search: Instead of assigning fixed values for the parameter C and r, we can also sample
random values from a particular distribution. For this case, we will be sampling from a log-uniform
distribution, i.e., the log of random variables follows a uniform distribution:
P[a ≤ log C ≤ b] = k(b − a)
for some constant k so the distribution is valid. In other words, we sample a uniform distribution
with the same range as above to yield exponents xi
, and corresponding values of C = 10xi
.
In your case, the values should range from the powers of 10 which you used in part (i). Choose 25
pairs of such sampled pairs of (C, r). Again, this code may take time to run (our solution runs in
about 50 minutes on our test computer).
(b) Finally, using the training data from question 2 and the function implemented here, find the best values
for C and r for AUROC and both tuning schemes mentioned above. Report your findings in tabular
format. The table should have four columns: Tuning Scheme, C, r and AUROC. Again, in the case of
ties, report the lower C and the lower r values that perform the best (prioritizing a lower C) . Your table
should look be similar to the following:
Tuning Scheme C r AUROC
Grid Search
Random Search
How does the 5-fold CV performance vary with C and r? Also, is the use of random search better than
grid search? Give reasons for your conclusion.
7
3.3 Learning Non-linear Classifiers with a Linear-Kernel SVM [5 pts]
Here, we will explore the use of an explicit feature mapping in place of a kernel. (Note: you do not need to
write any code for question 3.3)
(a) Describe a feature mapping, φ(¯x), that maps your data to a feature space similar to the one implied by
the quadratic kernel from the question above.
(b) Instead of using a quadratic-kernel SVM, we could simply map the data to this higher dimensional
space via this mapping and then learn a linear classifier in this higher-dimensional space. What are the
tradeoffs (pros and cons) of using an explicit feature mapping over a kernel? Explain.
3.4 Linear-Kernel SVM with L1 Penalty and Squared Hinge Loss [5 pts]
In this part of the project, you will explore the use of a different penalty (i.e., regularization term) and a
different loss function. In particular, we will use the L1 penalty and squared hinge loss which corresponds
to the following optimization problem.
minimize
θ,b ¯
||¯θ||1 + C
Xn
i=1
loss(y
(i)
(
¯θ · x¯
(i) + b))
where loss(z) = max{0,(1 − z)}
2
. We will make use of the LinearSVC() class, which uses the
squared hinge loss and allows us to specify the penalty. We will consider only a linear-kernel SVM
and the original (untransformed) features. When calling LinearSVC please use the following settings:
LinearSVC(penalty=‘l1’, dual=False, C=c, class weight=’balanced’). As always,
you may implement and use the helper function select classifier to instantiate your SVM classifier.
(a) Using the training data from question 2 and 5-fold CV, find the best setting for C that maximizes the
AUROC using grid search CV with range specified in 3.1(b). When we say ”grid search” here, we mean
searching a one-dimensional grid as the only hyperparameter that is changing is C. In the case of ties,
report the lower C value. Report your findings.
(b) Similar to 3.1(e), plot the L0-norm of the learned parameter ¯θ against C using complete training data
and no cross-validation. You should be able to re-use the function plot weight with different input
parameters without writing additional code. Include the plot in your write-up.
(c) Beyond any change in performance you may notice, what effect does the L1 penalty have on the optimal
solution? (Hint: Have a careful look at the plots you generated!)
(d) Note that using the Squared Hinge Loss (as opposed to the Hinge Loss) changes the objective function
as shown above. What effect do you expect this will have on the optimal solution?
8
4 Asymmetric Cost Functions and Class Imbalance [20 pts]
The training data we have provided you with so far is balanced: the data contain an equal number of positive
and negative ratings. But this is not always the case. In many situations, you are given imbalanced data,
where one class may be represented more than the others.
In this section, you will investigate the objective function of the SVM, and how we can modify it to fit
situations with class imbalance. Recall that the objective function for an SVM in scikit-learn is as follows:
minimize
θ,b,ξ ¯
i
||¯θ||2
2
+ C
Xn
i=1
ξi
subject to y
(i)

¯θ · φ(¯x
(i)
) + b
?
≥ 1 − ξi
ξi ≥ 0, ∀i = 1, 2, 3, ..., n
We can modify it in the following way:
minimize
θ,b,ξ ¯
i
||¯θ||2
2
+ Wp ∗ C
X
i|y
(i)=1
ξi + Wn ∗ C
X
i|y
(i)=−1
ξi
subject to y
(i)

¯θ · φ(¯x
(i)
) + b
?
≥ 1 − ξi
ξi ≥ 0, ∀i = 1, 2, 3, ..., n
where P
i|y
(i)=1 is a sum over all indices i where the corresponding point is positive y
(i) = 1. Similarly,
P
i|y
(i)=−1
is a sum over all indices i where the corresponding point is negative y
(i) = −1.
4.1 Arbitrary class weights [6 pts]
Wp and Wn are called “class weights” and are built-in parameters in scikit-learn.
(a) Describe how this modification will change the solution. If Wn is much greater than Wp, what does this
mean in terms of classifying positive and negative points? Refer to the weighted SVM formulation for
a brief justification of your reasoning.
(b) Create a linear-kernel SVM with hinge loss and L2-penalty with C = 0.01. This time, when calling
SVC, set class weight={-1: 10, 1: 1}, or implement and use your select classifier
helper function. This corresponds to setting Wn = 10 and Wp = 1. Train this SVM on the training
data X train, y train. Report the performance of the modified SVM on the test data X test,
y test for each metric below.
Note: You should be using SVC, not LinearSVC.
Performance Measures Performance
Accuracy
F1-Score
AUROC
Precision
Sensitvity
Specificity
(c) Also, answer the following: Compared to your work in question 3.1(d), which performance measures
were affected the most by the new class weights? Why do you suspect this is the case?
Note: We set C = 0.01 to ensure that interesting trends can be found regardless of your work in
question 3. This may mean that your value for C differs in 4 and 3.1(d). In a real machine learning
setting, you’d have to be more careful about how you compare models.
4.2 Imbalanced data [5 pts]
You just saw the effect of arbitrarily setting the class weights when our training set is already balanced. Let’s
return to the class weights you are used to: Wn = Wp. We turn our attention to class imbalance. Using the
functions you wrote in part 2, we have provided you with a second feature matrix and vector of binary labels
IMB features, IMB labels. This class-imbalanced data set has 800 positive points and 200 negative
points. It also comes with a corresponding test feature matrix and label vector pair IMB test features,
IMB test labels, which have the same class imbalances.
(a) Create a linear-kernel SVM with hinge loss, L2-penalty and as before, C = 0.01. Set class weight={-1:
1, 1: 1}, which returns the SVM to the formulation you have seen in class. Now train this SVM
on the class-imbalanced data IMB features, IMB labels provided.
Use this classifier to predict the provided test data IMB test features, IMB test labels and
report the accuracy, specificity, sensitivity, precision, AUROC, and F1-Score of your predictions:
Class Weights Performance Measures Performance
Wn = 1, Wp = 1 Accuracy
Wn = 1, Wp = 1 Sensitivity
Wn = 1, Wp = 1 Specificity
Wn = 1, Wp = 1 Precision
Wn = 1, Wp = 1 AUROC
Wn = 1, Wp = 1 F1-Score
(b) How has training on an imbalanced data set affected performance?
4.3 Choosing appropriate class weights [6 pts]
(a) Now we will return to setting the class weights given the situation we explored in part 4.2. Using what
you have done in the preceding parts, find an appropriate setting for the class weights that mitigates the
situation in part 4.2 and improves the classifier trained on the imbalanced data set. That is, find class
weights that give a good mean cross validation performance, (Think: which performance metric(s) are
informative in this situation, and which metric(s) are less meaningful? Make sure the metric you use
for cross-validation is a good choice given the imbalanced class weights). Report here your strategy
for choosing an appropriate performance metric and weight parameters. This question requires you
to choose hyperparameters based on cross-validation; you should not be using the test data to choose
hyperparameters.
10
(b) Use your custom classifier to predict the provided test data IMB test features, IMB test labels
again, and report the accuracy, specificity, sensitivity, precision, AUROC, and F1-Score of your predictions:
Class Weights Performance Measures Performance
Wn =?, Wp =? Accuracy
Wn =?, Wp =? Sensitivity
Wn =?, Wp =? Specificity
Wn =?, Wp =? Precision
Wn =?, Wp =? AUROC
Wn =?, Wp =? F1-Score
4.4 The ROC curve [3 pts]
Given the above results, we are interested in investigating the AUROC metric more. First, provide a plot of
the ROC curve with labeled axes for both Wn = 1, Wp = 1 and your custom setting of Wn, Wp from above.
Put both curves on the same set of axes. Make sure to label the plot in a way that indicates which curve is
which.
5 Challenge [20 pts]
Now, a challenge: in the previous problems, we had transformed the data into a binary dataset by combining
multiple labels to generate two labels.
For this challenge, you will consider the original multiclass labels of the reviews. We have already prepared a
held-out test set heldout features for this challenge, and multiclass training data multiclass features,
multiclass labels. This training data has 1200 reviews, 400 of each class. You must work only
with the provided data; acquiring new data to train your model is not permitted. (Notice, if you
look into the dataset, there are additional information that could be leveraged, such as summary text and
review time). Your goal is to train a multiclass classifier using the SVC or LinearSVC classes to predict the true ratings of the held-out test set, i.e., you will train your model on multiclass features
and test on heldout features. If you wish to take advantage of the other features in the dataset,
you will have to modify either the get multiclass training data function in helper.py or the
generate feature matrix function in project1.py.
Note that the class balance of this training set matches the class balance of the heldout set. Also note that,
given the size of the data and the feature matrix, training may take several minutes.
In order to attempt this challenge, we encourage you to apply what you have learned about hyperparameter
selection and consider the following extensions:
1. Try different feature engineering methods. The bag-of-words models we have used so far are
simplistic. There are other methods to extract different features from the raw data, such as:
(a) Using a different method for extracting words from the reviews
(b) Using only a subset of the raw features
11
(c) Utilizing the raw movie rating (before being converted to label) and/or the review summary
column from the training dataset
(d) Using the number of times a word occurs in a review as a feature (rather than binary 0, 1 features
indicating presence)
(e) Scaling or normalizing the data
(f) Alternative feature representations
2. Read about one-vs-one and one-vs-all. These are the two standard methods to implement multiclass
classifier using binary classifiers. You should understand the differences between them and implement
at least one.
You will have to save the output of your classifier into a csv file using the helper function
generate challenge labels(y, uniqname) we have provided. The base name of the output file
must be your uniqname followed by the extension csv. For example, the output filename for a user with
uniqname foo would be foo.csv. This file will be submitted according to the instructions at the end of the
file. You may use the file test output.py to ensure that your output has the correct format. To run this
file, simply run python test output.py -i uniqname.csv, replacing the file uniqname.csv
with your generated output file.
We will evaluate your performance in this challenge based on two components:
1. Write-Up and Code [10 pts]: We will evaluate how much effort you have applied to attempt this
challenge based on your write-up and code. Ensure that both are present. Within your write-up,
you must provide discussions of the choices you made when designing the following components of
your classifier:
• Feature engineering
• Hyperparameter selection
• Algorithm selection (e.g., quadratic vs. linear kernel)
• Multiclass method (e.g., one-vs-rest vs. one-vs-all)
• Any techniques you used that go above and beyond current course material
2. Test Scores [10 pts]: We will evaluate your classifier based on accuracy. Consider the following
confusion matrix:
-1 0 1
-1 x1
0 x2
1 y1 x3
where each column corresponds to the actual class and each row corresponds to the predicted class.
For instance, y1 in the matrix above is the number of reviews with true rating −1 (poor), but are
classified as a review with rating 1 (good) by your model. The accuracy for a multiclass classification
problem is defined as follows:
accuracy =
x1 + x2 + x3
n
where n is the number of samples.
12
REMEMBER to submit your project report by Tuesday, 11:59pm ET on October 8th, 2019 to
Gradescope.
Include your code as an appendix (copy and pasted) in your report. Please try to format lines of code
so they are visible within the pages.
Upload your file uniqname.csv containing the label predictions for the held-out data at this link
https://tinyurl.com/445f19p1 providing your umich.edu email.
Appendix A: Approximate Run-times for Programming Problems
• Problem 3.1 c: around 30 minutes
• Problem 3.2 a i: around 35 minutes
• Problem 3.2 a ii: around 20 minutes
• Problem 3.4 a: around 10 seconds
• Problem 3.4 b: around 3 seconds
• Problem 4.3: around 15 minutes
• Problem 5: around 2 minutes per configuration
N.B. these are approximate times, not exact. Different computers will result in different run-times, so do not
panic if yours is a little different. Algorithmic optimization can also improve run-time noticeably in certain
cases. However, if it is taking more than twice as long, something might be wrong.
Appendix B: Topics and Concepts
The relevant topics for each section are as follows:
• Problem 3.1 a, b, e, f, Problem 3.4 c, d
– Support Vector Machines; Primal Formulation; Geometric Margin
• Problem 3.2 a, Problem 3.3 a, Problem 4.1 a
– Dual Formulation; Kernels
• Problem 3.1 c, d, Problem 3.2 b, Problem 3.3 b, Problem 3.4 a, b, Problem 4.1 b, c, Problem 4.2
a, b, Problem 4.3 a, b, Problem 4.4 a, b
– Performance Measures
The date provided for a problem indicates the last lecture in which relevant material is discussed. Some
problems can be completed before their listed date.
13

More products