$30
Problem 1: Recall the naive Bayes model.
● For simplicity, we only consider binary features
● The generation model is
Here: A Bernoulli distribution parametrized by 𝜋 means that
Pr[𝑋 = 1] = π and Pr[𝑋 = 0] = 1 − π.
It is a special case of categorical distributions in that only two cases are considered.
● Such a model can be used to represent a document in text classification. For example,
the target indicates Spam or NotSpam. The feature indicates if a word in the vocabulary
occurs in the document.
Show that the decision boundary of naive Bayes is also linear.
Problem 2: Give a set of parameters of the below stack of logistic regression models to
accomplish the non-linear classification.
Problem 3: Using the above example to show the optimization of neural networks is
non-convex.
Hint: As mentioned in class, you can rename 𝑐 as and as . Then, you will have two
1
𝑐
2
𝑐
2
𝑐
1
models. Interpolating them will give you a very bad model.
END OF W9