Starting from:

$30

Assignment #2 Naive Bayes Classifiers

Machine Learning
COMP 5630/ COMP 6630/ COMP 6630 
Assignment #2
Naive Bayes Classifiers
Submission Instructions
Please submit your solutions
via Canvas (https://auburn.instructure.com/). You should submit your assignment as a typeset
PDF. Please do not include scanned or photographed equations as they are difficult for us to grade.
Late Submission Policy
The late submission policy for assignments will be as follows unless otherwise specified:
1. 75% credit within 0-48 hours after the submission deadline.
2. 50% credit within 48-96 hours after the submission deadline.
3. 0% credit after 96 hours after the submission deadline.
Tasks
1 Independent Events and Bayes Theorem [20 pts]
1. [5 Points] For events A, B prove:
P(A|B) = P(B|A)P(A)
P(B|A)P(A) + P(B|¬A)P(¬A)
(¬A denote the event that A does not occur.)
1
2. Let X, Y , and Z be random variables taking values in {0, 1}. The following table lists the
probability of each possible assignment of 0 and 1 to the variables X, Y , and Z:
Z = 0 Z = 1
X = 0 X = 1 X = 0 X = 1
Y = 0 0.1 0.05 0.1 0.1
Y = 1 0.2 0.1 0.175 0.175
(a) [5 Points] Is X independent of Y ? Why or why not?
(b) [5 Points] Is X conditionally independent of Y given Z? Why or why not?
(c) [5 Points] Calculate P(X ̸= Y |Z = 0).
2 Maximum Likelihood Estimation [40 pts]
This problem explores maximum likelihood estimation (MLE), which is a technique for estimating
an unknown parameter of a probability distribution based on observed samples. Suppose we
observe the values of n i.i.d.1
random variables X1, . . . , Xn drawn from a single Bernoulli
distribution with parameter . In other words, for each Xi
, we know that:
P(Xi = 1) = θ and P(Xi = 0) = 1 − θ
Our goal is to estimate the value of θ from these observed values of X1 through Xn.
For any hypothetical value ˆθ, we can compute the probability of observing the outcome X1, .
. . , Xn if the true parameter value θ were equal to ˆθ. This probability of the observed data is
often called the data likelihood, and the function L(
ˆθ) = P(X1, ...., Xn|
ˆθ) that maps each ˆθ to the
corresponding likelihood is called the likelihood function. A natural way to estimate the unknown
parameter θ is to choose the ˆθ that maximizes the likelihood function. Formally,
ˆθ
MLE = argmax
θˆ
L(
ˆθ)
Often it is more convenient to work with the log likelihood function l‘(ˆθ) = logL(
ˆθ). Since the
log function is increasing, we also have:
ˆθ
MLE = argmax
θˆ
l(
ˆθ)
1. [8 Points] Write a formula for the log likelihood function, l(
ˆθ). Your function should depend
on the random variables X1, . . . , Xn, the hypothetical parameter ˆθ, and should be simplified
as far as possible (i.e., don’t just write the definition of the log likelihood function). Does
the log likelihood function depend on the order of the random variables?
2. [8 Points] Consider the following sequence of 10 samples: X = (0, 1, 0, 1, 1, 0, 0, 1, 1, 1).
Compute the maximum likelihood estimate for the 10 samples. Show all of your work (hint:
recall that if x
∗ maximizes f(x), then f

(x

) = 0).
1
iid means Independent, Identically Distributed
2
3. [8 Points] Now we will consider a related distribution. Suppose we observe the values of m
iid random variables Y1,....,Ym drawn from a single Binomial distribution B(n, θ). A Binomial
distribution models the number of 1’s from a sequence of n independent Bernoulli variables
with parameter. In other words,
P(Yi = k) = 
n
k

θ
k
(1 − θ)
n−k =
n!
k!(n − k)! · θ
k
(1 − θ)
n−k
Write a formula for the log likelihood function, l(
ˆθ). Your function should depend on the
random variables Y1, . . . , Ym and the hypothetical parameter ˆθ.
4. [8 Points] Consider two Binomial random variables Y1 and Y2 with the same parameters,
n = 5 and θ. The Bernoulli variables for Y1 and Y2 resulted in (0, 1, 0, 1, 1) and (0, 0, 1, 1, 1),
respectively. Therefore, Y1 = 3 and Y2 = 3. Compute the maximum likelihood estimate for
the 2 samples. Show your work.
5. [8 Points] How do your answers for parts 1 and 3 compare? What about parts 2 and 4? If
you got the same or different answers, why was that the case?
3 Implementing Naive Bayes [40 pts]
We will now learn how to use Naive Bayes and Logistic Regression to solve a real world problem:
text categorization. Text categorization (also referred as text classification) is the task of assigning
documents to one or more topics. For our homework, we will use a benchmark dataset that is frequently used in text categorization problems. This dataset, Reuters-21578, consists of documents
that were appeared in Reuters newswire in 1987. Each document was then manually categorized
into a topic among over 100 topics. In this homework we are only interested in earn and acquisition
(acq) topics, so we will be using a shortened version of the dataset (documents assigned to topics
other than “earn” or “acq” are not in the dataset provided for the homework). As features, we
will use the frequency (counts) of each word occurred in the document. This model is known as
bag of words model and it is frequently used in text categorization. You can download Assignment
2 data from the Canvas. In this folder you will find:
• train.csv: Training data. Each row represents a document, each column separated by
commas represents features (word counts). There are 4527 documents and 5180 words.
• train labels.txt: labels for the training data
• test.csv: Test data, 1806 documents and 5180 words
• test labels.txt: labels for the test data
• word indices: words corresponding to the feature indices.
Implement Naive Bayes. To avoid 0 probabilities, choose a Beta distribution with equal valued
parameters as a prior when estimating Naive Bayes parameters using MAP. You may need to
implement with log probabilities to avoid underflow.
Train your classifier on the training set that is given. For each of the classifier, report training
accuracy, testing accuracy and the amount of time spent training the classifier.

More products