Assignment 2: Mixtures of Gaussians and Logistic Regression
Assignment 2: Mixtures of Gaussians and Logistic Regression CS480/680
Submit an electronic copy of your assignment via LEARN. Late submissions incur a 2% penalty for every rounded up hour past the deadline. For example, an assignment submitted 5 hours and 15 min late will receive a penalty of ceiling(5.25) * 2% = 12%. Be sure to include your name and student number with your assignment. 1. [50 pts] Implement the following two classification algorithms. Do not use any machine library, but feel free to use libraries for linear algebra and feel free to verify your results with existing machine learning libraries. Use the same dataset (handwritten digits) as for assignment 1 to train the algorithms. (a) [25 pts] Mixture of Gaussians: let π = Pr(y = C1) and 1 − π = Pr(y = C2). Let Pr(x|C1) = N(x|µ1, Σ) and Pr(x|C2) = N(x|µ2, Σ). Learn the parameters π, µ1, µ2 and Σ by likelihood maximization. Use Bayes theorem to compute the probability of each class given an input x: Pr(Cj |x) = k Pr(Cj ) Pr(x|Cj ). (b) [25 pts] Logistic regression: let Pr(C1|x) = σ(wT x+w0) and Pr(C2|x) = 1−σ(wT x+w0). Learn the parameters w and w0 by conditional likelihood maximization. More specifically use Newton’s algorithm derived in class to optimize the parameters. 10 iterations of Newton’s algorithm should be sufficient for convergence. Add a penalty of 0.5λ||w||2 2 to regularize the weights. Find the optimal hyperparameter λ by 10-fold cross-validation. What to hand in: • Draw a graph that shows the cross-validation accuracy of logistic regression as λ varies. Report the best λ. • Report the accuracy of mixtures of Gaussians and logistic regression (with the best λ for regularization) on the test set. Measure the accuracy by counting the average number of correctly labeled images. An image is correctly labeled when the probability of the correct label is greater than 0.5. • Print the parameters π, µ1, µ2, Σ found for mixtures of Gaussian. Since the covariance Σ is quite big, print only the diagonal of Σ. Print also the parameters w, w0 found for logistic regression. • Briefly discuss the results: – Mixture of Gaussians and logistic regression both find a linear separator, but they use different parameterizations and different objectives. Compare the number of parameters in each model and the amount of computation needed to find a solution with each model. Compare the results for each model. – Mixture of Gaussians and logistic regression find a linear separator where as k-Nearest Neighbours (in assignment 1) finds a non-linear separator. Compare the expressivity of the separators. Discuss under what circumstances each type of separator is expected to perform best. What could explain the results obtained with KNN in comparison to the results obtained with mixtures of Gaussians and linear regression? 1 • A copy of your code. 2. [50 pts] Linear separability (a) [25 pts] Consider a threshold perceptron that predicts y = 1 when wT x + w0 ≥ 0 and y = 0 when wT x + w0 < 0. It is interesting to study the class of Boolean functions that can be represented by a threshold perceptron. Assume that the input space is X = {0, 1} 2 and the output space is Y = {0, 1}. For each of the following Boolean functions, indicate whether it is possible to encode the function as a threshold perceptron. If it is possible, indicate some values for w and w0. If it is not possible, indicate a feature mapping φ : X → Xˆ restricted to the space of polynomial mappings with values for w and w0 such that wT φ(x) + w0 is a linear separator that encodes the function. • and • or • exclusive-or • iff (b) [25 pts] Is the training set used in Question 1 linearly separable? To answer this question, design an experiment that involves a logistic regression classifier that will allow you to verify whether the traing set is linearly separable. Describe your experiment and the results. Indicate whether the training set is linearly separable based on the results. 2