Starting from:

$29.99

Stochastic Gradient Descent and Logistic Regression

Homework 3: Stochastic Gradient Descent and Logistic Regression
Your solutions to theoretical questions should be done in Markdown/MathJax directly below the associated question. Your solutions to computational questions should include any specified Python code and results as well as written commentary on your conclusions. Remember that you are encouraged to discuss the problems with your instructors and classmates, but **you must write all code and solutions on your own**. For a refresher on the course
\n",
"\n",
"**NOTES**: \n",
"\n",
"- Do **NOT** load or use any Python packages that are not available in Anaconda 3.6. \n",
"- Some problems with code may be autograded. If we provide a function API **do not** change it. If we do not provide a function API then you're free to structure your code however you like. \n",
"- Submit only this Jupyter notebook to Moodle. Do not compress it using tar, rar, zip, etc. "
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"import pickle, gzip\n",
"import numpy as np\n",
"import pandas as pd\n",
"import matplotlib.pylab as plt\n",
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### [25 points] Problem 1 - MLE and SGD for the Exponential Distribution Rate Parameter\n",
"***\n",
"\n",
"Suppose you're given $n$ numbers $x_1, x_2, \\ldots, x_n$ (think training data) and told that they're samples from the exponential distribution $Exp(\\lambda)$ where the rate parameter $\\lambda$ is unknown. Recall that the probability density function for $Exp(\\lambda)$ is given by \n",
"\n",
"$$\n",
"f_\\lambda(x) = \\left\\{\n",
"\\begin{array}{rl}\n",
"0 & \\textrm{if } x < 0 \\\\\n",
"\\lambda e^{-\\lambda x} & \\textrm{if } x \\geq 0\n",
"\\end{array}\n",
"\\right.\n",
"$$\n",
"\n",
"In this problem we'll use Maximum Likelihood Estimation to estimate the rate parameter by hand and with Stochastic Gradient Descent. \n",
"\n",
"**Part A**: Write down the likelihood function $L(\\lambda)$ for the data set $x_1, x_2, \\ldots, x_n$. "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"###### Our likelihood function $L(\\lambda)$ can be written as follows: \n",
"$$L(\\lambda \\ | \\ x_{1} . . . x_{n}) = \\prod_{i=1}^{n} f_{\\lambda}(x_{i} \\ | \\ \\lambda)$$\n",
"\n",
"$$ =\\prod_{i=1}^{n} \\lambda e^{-\\lambda x_{i}} $$\n",
"\n",
"$$ \\boxed{= \\lambda^n e^{-\\lambda \\cdot \\sum_{i=1}^{n} x_{i}}}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part B**: Write down the associated Negative Log-Likelihood $\\textrm{NLL}(\\lambda)$ and simplify it algebraically. "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"$$NLL(\\lambda) = -log(\\lambda^n e^{-\\lambda \\sum_{i=1}^{n} x_{i}})$$\n",
"\n",
"Simplifying using log properties:\n",
"\n",
"$$= -log(\\lambda^n) - log(e^{-\\lambda \\sum_{i=1}^{n} x_{i}})$$\n",
"\n",
"$$\\boxed{ = -n \\cdot log(\\lambda) + \\lambda \\sum_{i=1}^{n} x_{i}}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part C**: Find a formula for the MLE of the rate parameter $\\lambda$ by taking the derivative of $\\textrm{NLL}(\\lambda)$, setting it equal to zero, and solving for $\\hat{\\lambda}$. "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"$$\\frac{dNLL(\\lambda)}{d\\lambda} = \\frac{-n}{\\lambda} + \\sum_{i=1}^{n}x_{i}$$\n",
"\n",
"$$\\frac{-n}{\\lambda} + \\sum_{i=1}^{n}x_{i} = 0$$\n",
"\n",
"$$\\sum_{i=1}^{n}x_{i} = \\frac{n}{\\lambda}$$\n",
"\n",
"$$\\boxed{\\hat{\\lambda} = \\frac{n}{\\sum_{i=1}^{n}x_{i}}}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part D**: Use the formula you found in **Part C** to estimate the rate parameter $\\lambda$ for the following training data. "
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"lam = 3.0; x_train = np.random.exponential(1/lam, size=10) # Note: numpy's exponential sampler expects 1 over rate parameter"
]
},
{
"cell_type": "code",
"execution_count": 53,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"MLE for Lambda = 3.747\n"
]
}
],
"source": [
"lamhat = x_train.shape[0] / np.sum(x_train)\n",
"print(\"MLE for Lambda = {:.3f}\".format(lamhat))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part E**: Describe a **Stochastic** Gradient Descent algorithm based on the $\\textrm{NLL}$ you found in **Part B**. **Hint**: Think of what the loss function would be if the training set contained just a single point. "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"##### We can use an SGD update algorithm that can be defined as:$$\\hat{\\lambda} \\leftarrow \\hat{\\lambda} - \\eta \\cdot \\frac{dNLL}{d\\lambda}$$$$\\hat{\\lambda} \\leftarrow \\hat{\\lambda} - \\eta \\cdot (-\\frac{1}{\\lambda} + x_{i})$$where $\\hat{\\lambda}$ is initially a random guess.\n",
"\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part F**: Implement the scheme described in **Part E** and run it on your training set. Does it converge to the MLE that you found in **Part D**? "
]
},
{
"cell_type": "code",
"execution_count": 110,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3.749911647444503\n"
]
}
],
"source": [
"eta = 0.03\n",
"lam_hat = 4\n",
"epoch = 5000\n",
"for e in range(epoch):\n",
" for ii in x_train:\n",
" lam_hat -= eta*((-1/lam_hat) + ii)\n",
"print(lam_hat)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### [20 points] Problem 2 - Regularized Logistic Regression Intuition \n",
"***\n",
"\n",
"Consider the training set shown below where red dots correspond to training examples with label $y=1$ and blue dots correspond to training examples with label $y = 0$. Suppose you fit a logistic regression model of the form \n",
"\n",
"$$\n",
"p(y = 1 \\mid {\\bf x}) = \\textrm{sigm}(\\beta_0 + \\beta_1 x_1 + \\beta_2 x_2) = \\dfrac{1}{1 + \\exp(-\\boldsymbol{\\beta}^T{\\bf x})}\n",
"$$\n",
"\n",
"where here in $\\boldsymbol{\\beta}^T{\\bf x}$ the vector ${\\bf x}$ has had a $1$ prepended so that it looks like ${\\bf x} = (1, x_1, x_2)$. "
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x113470d68"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"X = np.array([[1,2,1], [1,1,5], [1,2,5], [1,3,5], [1,1,6], [1,2,6], [1,5,1], [1,6,1], [1,7,1], [1,6,2], [1,7,2], [1,5,5]], dtype=float)\n",
"y = np.array([1 if ii < 6 else 0 for ii in range(X.shape[0])], dtype=float)\n",
"fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(5,5))\n",
"\n",
"xvals = np.linspace(0,8)\n",
"part_b = lambda x: -7 + (15/5)*x\n",
"part_c = lambda x: 0 + (8/6)*x\n",
"part_d = lambda x: 3.5 + 0*x\n",
"part_e = lambda x: 0 + (x-4)*100000\n",
"\n",
"ax.scatter(X[:,1], X[:,2], color=[\"#a76c6e\" if ii < 6 else \"steelblue\" for ii in range(X.shape[0])], s=250)\n",
"ax.plot(xvals, part_b(xvals), lw=3, label=\"part B\") # no restrictions / regularization\n",
"ax.plot(xvals, part_c(xvals), lw=3, label=\"part C\") # beta_0 is zero\n",
"ax.plot(xvals, part_d(xvals), lw=3, label=\"part D\") # beta_1 is zero\n",
"ax.plot(xvals, part_e(xvals), lw=3, label=\"part E\") # beta_0 and beta_1 both approach infinity\n",
"ax.grid(alpha=0.25); ax.set_xlim([0,8]); ax.set_ylim([0,8]); ax.set_xlabel(r\"$x_1$\", fontsize=16); ax.set_ylabel(r\"$x_2$\", fontsize=16)\n",
"ax.legend(loc=\"upper right\");"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part A**: Suppose you use the the standard Logistic Regression decision rule such that a query point ${\\bf x}$ is predicted to be $\\hat{y} = 1$ if $p(y = 1 \\mid {\\bf x}) \\geq 0.5$ and $\\hat{y} = 0$ otherwise. Describe the decision boundary of such a classifier. How could you plot the decision boundary in a 2D feature space like the one shown above? "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"###### The general formula for a decision boundary in a 2d feature space is: $$x_{2} = \\frac{-\\beta_{0}}{\\beta_{2}} - \\frac{\\beta_{1}}{\\beta_{2}} \\cdot x_{1}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part B**: Suppose you learn a Logistic Regression classifier from this training set by minimizing the negative log-likelihood \n",
"\n",
"$$\n",
"NLL(\\boldsymbol{\\beta}) = -\\displaystyle\\sum_{i=1}^n \\left[y_i \\log \\textrm{sigm}(\\boldsymbol{\\beta}^T{\\bf x}) + (1-y_i)\\log(1 - \\textrm{sigm}(\\boldsymbol{\\beta}^T{\\bf x}))\\right]\n",
"$$\n",
"\n",
"Describe a possible decision boundary that you could learn as a result. Plot the decision boundary on the graph above and label it \"part B\". How many training examples does your learned decision boundary misclassify? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### A possible decision boundary could be one that splits the points with a linear line like shown above (where none of the points are misclassified). This is where our linear line has our learned $\\beta$ values after learning our Logistic Regression classifier.\n",
"##### A possible equation is: $$x_{2} = -7 + \\frac{15}{5} x_{1}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part C**: Suppose you learn a Logistic Regression classifier from this training set by minimizing the negative log-likelihood with the parameter $\\beta_0$ so strongly regularized that it approaches zero, and the other parameters unregularized. \n",
"\n",
"$$\n",
"\\textrm{Loss}(\\boldsymbol{\\beta}) = NLL(\\boldsymbol{\\beta})+ \\lambda \\beta_0^2\n",
"$$\n",
"\n",
"Describe a possible decision boundary that you could learn as a result. Plot the decision boundary on the graph above and label it \"part C\". How many training examples does your learned decision boundary misclassify? "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"##### Since $\\beta_{0}$ is now zero, there is no y-intercept on our graph and our decision boundary line must intersect the origin. Our slope may still vary and can look like the line on our graph. Notice one of our red points has been misclassified.\n",
"##### A possible equation is: $$x_{2} = 0 + \\frac{8}{6} x_{1}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part D**: Suppose you learn a Logistic Regression classifier from this training set by minimizing the negative log-likelihood with the parameter $\\beta_1$ so strongly regularized that it approaches zero, and the other parameters unregularized. \n",
"\n",
"$$\n",
"\\textrm{Loss}(\\boldsymbol{\\beta}) = NLL(\\boldsymbol{\\beta})+ \\lambda \\beta_1^2\n",
"$$\n",
"\n",
"Describe a possible decision boundary that you could learn as a result. Plot the decision boundary on the graph above and label it \"part D\". How many training examples does your learned decision boundary misclassify? "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"##### Now that $\\beta_{1}$ is zero, this means our decision boundary has a slope of zero. However, our y-intercept can be shifted along the y-axis and look like the example on our graph. Notice two points are miscliassifed . . . one red and one blue point.\n",
"##### A possible equation is: $$x_{2} = 3.5 + 0 \\cdot x_{1}$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part E**: Suppose you learn a Logistic Regression classifier from this training set by minimizing the negative log-likelihood with the parameter $\\beta_2$ so strongly regularized that it approaches zero, and the other parameters unregularized. \n",
"\n",
"$$\n",
"\\textrm{Loss}(\\boldsymbol{\\beta}) = NLL(\\boldsymbol{\\beta})+ \\lambda \\beta_2^2\n",
"$$\n",
"\n",
"Describe a possible decision boundary that you could learn as a result. Plot the decision boundary on the graph above and label it \"part E\". How many training examples does your learned decision boundary misclassify? "
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"###### If $\\beta_{2}$ is approaching zero, our $\\beta_{0}$ and $\\beta_{1}$ are going to blow up to $+\\infty$. This will give us a veritcal decision boundary line. Notice that no points are misclassfied for this line.\n",
"##### A possible equation is: $$x_{2} = 0 + 10000 \\cdot (x_{1} - 4)$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### [30 points] Problem 3: SGD for Regularized Logistic Regression \n",
"***\n",
"\n",
"In this problem you'll implement a Logistic Regression class that trains a classifier using Stochastic Gradient Descent with $\\ell_2$-Regularization. In Problem 4 you'll use this class to do document classification. Here your job will be to implement the following methods: \n",
"\n",
"- `train`: Takes in learning rate, regularization strength, and number of epochs to do, and learns model parameters using SGD \n",
"- `predict`: Takes in a matrix of examples and predicts binary labels in $\\{0,1\\}$\n",
"- `accuracy`: Takes in a matrix of examples and true labels, makes predictions, and returns accuracy as value in $[0,1]$ \n",
"\n",
"Note that you should assume that all features have been prepended with a $1$ so that each example is the same length as the parameter vector `beta`. \n",
"\n",
"There are some optional methods that you may implement if you like which might make your life easier in later problems. These will not be unit-tested or graded though. They are \n",
"\n",
"- `predict_proba`: Takes in a matrix of examples and estimates $p(y=1 \\mid {\\bf x})$ for each example. \n",
"- `mean_loss`: Takes in a matrix of examples and true labels and evaluates the negative log-likelihood \n",
"\n",
"Finally, the method `best_text_features` will not be needed until **Problem 4**. \n",
"\n",
"The section below the class skeleton contains more details as well as unit tests. Note that the unit tests are based on a subset of the toy data in **Problem 2**. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"class LogReg:\n",
" \"\"\"\n",
" Class to train a logistic regression classifier on the training data\n",
" \"\"\"\n",
" \n",
" def __init__(self, X_train, y_train, X_valid=None, y_valid=None):\n",
" \"\"\"\n",
" Initialize classifier \n",
" \n",
" :param X_train: ndarray of training features (with column of 1s prepended)\n",
" :param y_train: ndarray of training labels {0,1}\n",
" :param X_valid: ndarray of validation features (with column of 1s prepended)\n",
" :param y_valid: ndarray of validation labels {0,1}\n",
" \"\"\"\n",
" \n",
" self.X_train = X_train \n",
" self.y_train = y_train \n",
" \n",
" self.X_valid = X_valid \n",
" self.y_valid = y_valid \n",
" \n",
" # Array of logistic regression weights \n",
" self.beta = np.random.randn(self.X_train.shape[1])\n",
" \n",
" # list for storing loss function histories \n",
" self.train_history = [] \n",
" self.valid_history = [] \n",
" \n",
" @staticmethod\n",
" def sigmoid(z, threshold=20):\n",
" \"\"\"\n",
" Evaluate the sigmoid function \n",
" :param z: argument of sigmoid function \n",
" :param threshold: threshold parameter to prevent over/underflow \n",
" \"\"\"\n",
" \n",
" if np.abs(z) threshold:\n",
" z = np.sign(z) * threshold\n",
" \n",
" return 1.0 / (1 + np.exp(-z))\n",
" \n",
" def train(self, eta=0.01, lam=0.0, num_epochs=10):\n",
" \"\"\"\n",
" train LogReg model using SGD with regularization \n",
" \n",
" :param eta: the learning rate \n",
" :param lam: the regularization strength\n",
" :param num_epochs: number of epochs to perform in training \n",
" :return : returns nothing, just updates weights\n",
" \"\"\"\n",
" \n",
" for ee in range(0, num_epochs): # Loop through all epochs\n",
" \n",
" shuffled_inds = list(range(self.X_train.shape[0]))\n",
" np.random.shuffle(shuffled_inds)\n",
" \n",
" counter = 0\n",
" for ii in shuffled_inds: # Loop through shuffeled training data indices\n",
" counter += 1\n",
" if counter % 50 == 0: # every 50 training examples, get the accuracy\n",
" self.train_history.append(self.accuracy(self.X_train, self.y_train))\n",
" self.valid_history.append(self.accuracy(self.X_valid, self.y_valid))\n",
" sigII = self.sigmoid(np.dot(self.beta, self.X_train[ii])) - self.y_train[ii]\n",
" for k in range(0, self.beta.shape[0]): # Loop through each feature\n",
" if k == 0:\n",
" self.beta[0] -= eta * sigII\n",
" else:\n",
" self.beta[k] -= eta * (sigII * self.X_train[ii, k] + (2*lam*self.beta[k]))\n",
" \n",
" def predict_proba(self, X):\n",
" \"\"\"\n",
" predict probability p(y = 1 | x) for each row of X (this function is optional)\n",
" \n",
" :param X: ndarray of features \n",
" :return : ndarray of probabilities \n",
" \"\"\"\n",
" probs = np.zeros(X.shape[0])\n",
" for i,v in enumerate(X):\n",
" probs[i] = self.sigmoid(np.dot(self.beta, v))\n",
" return probs\n",
" \n",
" def predict(self, X):\n",
" \"\"\"\n",
" predict binary labels {0,1} for each row of X\n",
" \n",
" :param X: ndarray of features \n",
" :return: ndarray of binary labels {0,1}\n",
" \"\"\"\n",
" probs = self.predict_proba(X)\n",
" labels = [1 if prob = 0.5 else 0 for prob in probs]\n",
" return labels\n",
" \n",
" def accuracy(self, X, y):\n",
" \"\"\"\n",
" report accuracy of prediction\n",
" \n",
" :param X: ndarray of features \n",
" :param y: associated true labels\n",
" :return: accuracy as a float in [0.0,1.0]\n",
" \"\"\"\n",
" correct = 0.0\n",
" predict_labels = self.predict(X)\n",
" for ii in range(0,len(y)):\n",
" if predict_labels[ii] == y[ii]:\n",
" correct += 1.0\n",
" return correct / len(y)\n",
" \n",
" def mean_loss(self, X, y):\n",
" \"\"\"\n",
" report mean log-likelihood (this function is optional)\n",
" \n",
" :param X: ndarray of features \n",
" :param y: associated true labels\n",
" :return: average log-likelihood\n",
" \"\"\"\n",
" from sklearn.metrics import log_loss\n",
" return 0.0\n",
" \n",
" def best_text_features(self, vocab):\n",
" \"\"\"\n",
" Print 10 best features for each class \n",
" \n",
" :param vocab: list of vocab words\n",
" :return: returns nothing \n",
" \"\"\"\n",
" class0 = self.beta.argsort()[:10] # Sort self.beta to get min values\n",
" class1 = np.argpartition(self.beta, -10)[-10:] # Sort self.beta to get max values\n",
" print(\"\\nbest words for class 0\")\n",
" print(\"----------------------\")\n",
" for ind in class0:\n",
" print(vocab[ind])\n",
"\n",
" print(\"\\nbest words for class 1\")\n",
" print(\"----------------------\")\n",
" for ind in class1:\n",
" print(vocab[ind])\n",
" \n",
" "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part A**: Implement the `train` method so that it performs **unregularized** SGD updates of the model parameters by minimizing the negative log-likelihood loss function discussed in lecture: \n",
"\n",
"$$\n",
"\\textrm{NLL}({\\bf \\beta}) = -\\displaystyle\\sum_{i=1}^n \\left[y_i \\log \\textrm{sigm}(\\boldsymbol{\\beta}^T{\\bf x}) + (1-y_i)\\log(1 - \\textrm{sigm}(\\boldsymbol{\\beta}^T{\\bf x}))\\right] \n",
"$$\n",
"\n",
"\n",
"Note that your SGD updates should be vectorized, utilize Numpy routines as much as possible, and not make any assumptions about the number of features. When you think you're done, execute the following code cell to perform three unit tests. "
]
},
{
"cell_type": "code",
"execution_count": 92,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"testPosUnregUpdate (__main__.TestLogReg) ... ok\n",
"testNegUnregUpdate (__main__.TestLogReg) ... ok\n",
"testShuffelUnregUpdate (__main__.TestLogReg) ... ok\n",
"\n",
"----------------------------------------------------------------------\n",
"Ran 3 tests in 0.006s\n",
"\n",
"OK\n"
]
},
{
"data": {
"text/plain": [
"<matplotlib.figure.Figure at 0x114a061d0"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%run -i tests/new_tests.py \"prob 3A\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part B**: Update your implementation of the `train` method so that it performs **regularized** SGD updates of the model parameters to minimize the regularized loss function discussed in lecture\n",
"\n",
"$$\n",
"\\textrm{Loss}({\\bf \\beta}) = -\\displaystyle\\sum_{i=1}^n \\left[y_i \\log \\textrm{sigm}(\\boldsymbol{\\beta}^T{\\bf x}) + (1-y_i)\\log(1 - \\textrm{sigm}(\\boldsymbol{\\beta}^T{\\bf x}))\\right] + \\lambda\\displaystyle\\sum_{k=1}^p \\beta_k^2\n",
"$$\n",
"\n",
"Note that you should **NOT** regularize the bias parameter $\\beta_0$. When you think you're done, execute the following code cell to perform two unit tests. "
]
},
{
"cell_type": "code",
"execution_count": 93,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"testPosRegUpdate (__main__.TestLogReg) ... ok\n",
"testNegRegUpdate (__main__.TestLogReg) ... ok\n",
"\n",
"----------------------------------------------------------------------\n",
"Ran 2 tests in 0.002s\n",
"\n",
"OK\n"
]
}
],
"source": [
"%run -i tests/new_tests.py \"prob 3B\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part C**: Implement the `predict` function to take a matrix of examples and use the learned parameters to return a vector of predictions of $\\{0,1\\}$ for each example. When you think you're done, execute the following code cell to perform one unit test. \n"
]
},
{
"cell_type": "code",
"execution_count": 94,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"testPredict (__main__.TestLogReg) ... ok\n",
"\n",
"----------------------------------------------------------------------\n",
"Ran 1 test in 0.001s\n",
"\n",
"OK\n"
]
}
],
"source": [
"%run -i tests/new_tests.py \"prob 3C\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part D**: Implement the `accuracy` method to take a matrix of examples and a vector of true labels, make predictions, and return the accuracy of those predictions as a decimal value in $[0,1]$. Execute the following code cell to perform one final unit tests. \n"
]
},
{
"cell_type": "code",
"execution_count": 95,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"testAccuracy (__main__.TestLogReg) ... ok\n",
"\n",
"----------------------------------------------------------------------\n",
"Ran 1 test in 0.001s\n",
"\n",
"OK\n"
]
}
],
"source": [
"%run -i tests/new_tests.py \"prob 3D\""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### [25 points] Problem 4: Baseball vs Hockey \n",
"***\n",
"\n",
"In this problem you will train a Logistic Regression classifier to determine if a document is talking about baseball or hockey. The following code cell will load training and validation sets, as well as a list that encodes the map from feature index to particular words. "
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"f = gzip.open(\"data/baseball_hockey.pklz\", 'rb')\n",
"X_train, y_train, X_valid, y_valid, vocab = pickle.load(f)\n",
"f.close()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part A**: Look at the encoded features in `X_train` or `X_valid`. Which of the text models discussed in class do these features represent? Briefly justify your response. "
]
},
{
"cell_type": "code",
"execution_count": 97,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0\n",
" 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]\n"
]
}
],
"source": [
"print(X_valid[:,1])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###### The `X_valid` data looks like it is a sparse matrix where the entries are 0 or 1. These values represent booleans and if a word is present, then a value of 1 is placed in the matrix. Else the entry will be zero."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part B**: There are two additional files in the data directory called `positive_raw` and `negative_raw`. These are a subset of the actual documents that were cleaned and featurized to obtain our training and validation data. The documents in `positive_raw` correspond to examples with true label $y=1$ and the documents in `negative_raw` correspond to examples with true label $y=0$. Inspect some of the documents and decide which label corresponds to documents about baseball and which label corresponds to documents about hockey. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### After inspecting the `positive_raw` document, it is clear that this document corresponds to the label $y=1$ which is talking about Hockey. The `negative_raw` document is referring to the label $y=0$ which is talking about Baseball."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part C**: Use the class you wrote in **Problem 3** to train a logistic regression classifier to predict baseball vs hockey and report accuracy on the training and validation set. Do you see any signs of overfitting? \n",
"\n",
"**Hint**: You won't need to run very many epochs before convergence on this data. "
]
},
{
"cell_type": "code",
"execution_count": 100,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x114a197f0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training Accuracy 99.58228905597326% and Valid Accuracy: 86.18090452261306%\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x114ab4320"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training Accuracy 99.83291562238931% and Valid Accuracy: 88.81909547738694%\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x114a62b38"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training Accuracy 99.83291562238931% and Valid Accuracy: 91.20603015075378%\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x114a62908"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training Accuracy 99.74937343358395% and Valid Accuracy: 92.71356783919597%\n"
]
}
],
"source": [
"etas = [0.1, 0.2, 0.5, 0.43]\n",
"for ee in etas:\n",
" eta_textModel = LogReg(X_train, y_train, X_valid, y_valid)\n",
" eta_textModel.train(eta=ee, lam=0.0, num_epochs=10)\n",
" train_acc = eta_textModel.accuracy(X_train, y_train)\n",
" valid_acc = eta_textModel.accuracy(X_valid, y_valid)\n",
"\n",
" plt.plot(range(1, len(eta_textModel.train_history)+1), eta_textModel.train_history, label='Training Data')\n",
" plt.plot(range(1, len(eta_textModel.valid_history)+1), eta_textModel.valid_history, label='Valid Data')\n",
" plt.ylabel('% Accurracy')\n",
" plt.xlabel('Epochs')\n",
" plt.title('Training Examples vs Accurracy, ETA=' + str(ee))\n",
" plt.legend()\n",
" plt.show()\n",
" print('Training Accuracy ' + str(train_acc*100) + '% and Valid Accuracy: ' + str(valid_acc*100) +'%')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part D**: Modify your code so that it periodically records accuracy on the training and validation sets throughout the training process (try recording after every $50$ training examples). Experiment with the learning rate `eta` and produce plots like we showed in lecture. Which value of `eta` appears to give the best-ish convergence? "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### The best learning rate for `eta` overall was equal to `0.43`"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part E**: Once you've found a reasonable learning rate, experiment with the regularization strength. Show plots of accuracy over the training process for a few different values of `lam`. Which seems to work the best-ish and why? \n",
"\n",
"**Hint**: For this type of text data, you'll want to look at very small values of `lam` (like `lam=1e-3` or maybe even smaller). \n",
"\n",
"Report your final accuracy on the training and validation sets after you've tuned your model in **Parts D** and **E**. "
]
},
{
"cell_type": "code",
"execution_count": 108,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x114a15be0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training Accuracy 96.82539682539682% and Valid Accuracy: 90.45226130653266%\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYsAAAEWCAYAAACXGLsWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAIABJREFUeJzt3Xl4VOXZ+PHvnT3sYVd2AdnDFsUFFTdEreJWFddaLa1Va21ri/35utXX+nax1kqtqLjUBXfFFhfEXUEIsu8BWcK+hLAlJDNz//54TsghzCSTMJMJyf25rrkyZ3/OzOTc51nO84iqYowxxlQmKdEJMMYYU/dZsDDGGFMlCxbGGGOqZMHCGGNMlSxYGGOMqZIFC2OMMVWyYFHHiUiyiOwRkc6xXLc+EpEUEVER6ZrotJjI7Hs6MlmwiDHvYl32ColIkW/66uruT1WDqtpEVdfGct3qEpEHRaS0wvlti/Vx6jvvc1QRGZrotDRkItJBRN4TkY3e99HxMPfXTUQ+F5F9IrJERE73LRMR+aOIbBCRnSLyqYj0OfyzqF0WLGLMu1g3UdUmwFrgAt+8lyquLyIptZ/KGnvJf36q2jrRCTqSiIgA1wI7gOtq+dhJIpJU1bwGJARMAS6L0f5eA74FWgL3Am+JSCtv2RjgGuBkoDUwC3g+RsetNQ31h5Iw3p3lqyLyiojsBq4RkRNFZIZ317FRRB4TkVRv/YOy7CLyorf8fRHZLSLTRaRbddf1lp8rIstFpFBE/iEiX4vIj2pwTqeIyFYR6eBNDxGRAhHp6U3fLSKrvDQsEpELfdve5N2RPeadf56IDBORG0VknYhsFpFrfOu/KCLjRWSat79PRaRThHRliMgjvv38U0QyvGVtRWSKd8wdIvJFhH08JSIPV5j3XxH5hff+994d4y4RWSoiIyr5qE7HXSx+CVxV9h379vtTbx+7RWShiAz05ncRkXe8z3ibiPzdm/+giDzn276HiKhv+isR+YOITAf2Ap0jzLvJuxveLSIrReSmCum6RETmeueYJyIjRWSMiHxbYb3fisiblZx/WCJyoW//a0Xkfyqek4j8SETyve/qJ95vZIH3/f29usdU1Y2q+gQwO0KaWojIs97/Y76IPCARAquI9AX6A/erarGqvgYsBS72VukGfKmq36tqAHgJ6FfdNCecqtorTi9gNXBWhXkPAiXABbhgnQkcBwwDUoBjgOXArd76KYACXb3pF4FtQA6QCrwKvFiDddsCu4HR3rJfAaXAjyKcy4PAc5Wc6/8BU4FGwGLgZ75llwNHeed7FbAHaOctu8k77rVAMvAwsAZ4DEgHzgMKgUa+cyrE3aWlA+OBzyKc/z+At4EsoBnuTvIP3rI/A497554GnBrhvM7wvkfxplsBRUA73D/8GqC9t6wbcEwln9HzwMteuncCo33LxgDrgKGAAMcCnbxzWgj8BWjs/V5ODvedAD0A9U1/5aW9j3eeKRHmXYD73Yl3vkVAtrePk7y0nul9f52AXl46dgI9fcdb4D+nSj6Hit/TGd5nmQQMxP1mf+A/J++7Kvs9FHnfaxugI7Dd95mc5qUr0uuECmnJ8PbfscL894B/4n7P7XBB5cYI5/NDYEGFef8C/ua9P8bbvof3W3sEeCPR16dqX88SnYD6/CJysPikiu1+A7zuvQ8XAP7lW/dCYGEN1v0x7m6nbJkAG6k8WJRU+Meb6lueBsz1Lhj/reL8FgLne+9vApb4lg32zqGVb14h0N93Ti/6ljXHFSkc5T9/78JTDHTxrXsKsMJ7/xDwFtC9irQmAeuBk7zpm4GPvPe9gM24C2lKFftpgguSZRfBZ4A3fcunAbeE2e4UYBOQHOE7ec43HS5Y3FNhm0Pmhdnvf8rS4qXzzxHWewp3Nw0wCHeRT43i/+Kg32mY5Y+XHZPyYNGuwu/hUt/0u3g3VzX4Hz0kWAAdcAEp3TfvWv/vvcI+bgC+qjDv/4CnvffpuBsXBQLASv/v8kh5WTFUYqzzT4hIb69oY5OI7AIewBVXRLLJ934f7kJU3XWP9qdD3a86v4p0v6yqLXyvs33bl+DunPvj7oIP8IoQ5nlFBjuB3hx8fpt974uAoKpurzDPf47+dBfiLh5HV0hre9w/qf+4/8HlqKA8BzPNK3q5M9wJq2oIlyMb4826CleMgKouA36N+762iCtabB9uP8CluOD1oTf9EvADEWnpTXfCXUQq6gSsVtVghP1WZV1V80TkByLyrVfEsxMYSfn3Eyld4L7vskYb1wCvqmppdRMorhj2M6+YrRB3A3HQ719VK/5GKk5X9j9QXV1wv53Nvt/OeFwOAxFZJuWNPE7E3QQ0q7CPZricO8D9uJugDrjg9Efgk7Ii0SOFBYvEqNjV75O4u+0eqtoMuAd3px9PG3FZeOBA5WuHmu5MXHPdu4HngEekvM7lGOAJ3B15K1VtgSvPPZzzO1BHISLNcbmLDRXW2YzLCfXyBbfmqtocQFV3qeodqtoVuAj4nYicFuF4rwA/FFffMwSXI8Hbz4uqejKuCCoZdyEI53rcBWSdiGzy9plGeRBaB3QPs906oIuIJIdZthdXTFImXKAK1620v14jE3jDS3c77/v5iPLvJ1K6UNWvvH2cjAui/w63XhQmAW8Cnbzv52lq+PsQkRFycIu9iq8To9jNOtyNVUvfb6eZqmYDqGovLW/kMR1YBPQQEf93MdCbDy7X9YqqblDVgKo+jQs8vWtyjoliwaJuaIq7O94rrkndT2vhmP8BhojIBeJaZN2OKwOuNi/QPIcrp/0xrrXP/d7iJriL01Zv1Z9w+P8kF3h3o+m4opgvVXWjfwXvTvxp4FERaSNORxEZ6aX5AhHp7qW9EAjiirMOoaqzgF3ABGCKqu729tFHRE730lHkvQ7Zh4h0AUYA5+IuHINwF5O/Ut4q6mngtyIy2EtrT3EV99NxZfIPiUgjEcn0Ls7giv1OE5FOItICGFfNzxHcHXQa7vsJisgPcMVqZZ4BbvLOM8n7DHv5lv8bdzOwR1Vn+M75JhHJizINTYEdqlosIicAV9bgPABQ1c/04BZ7FV/TfWnMwJ0/QLr3PaKq64DPgb+ISDPvvHuIyKkRjrkYFxjuEdeo4jJcndDb3iqzgCvENapIEpEbvPmranqeiWDBom74Ne7Oczcul/FqvA/oZeuvwFW2bcfdPc4B9ley2dVh7tRa4SrHs4D7vOKsHwFjReQkVZ2PK6+dicvN9MI1MTwcL+KCxDYgm8jNUH+NK2qaiQsIHwE9vWW9gE9wRQhfA39X1S8rOeYrwFm4Cuoy6cCfvHRswn0G/y/MttcCs1R1mqpuKnsBfweGikhvVX0FV879Ki4wvQVkqWs98wPcxWcdrjl2WXPPD3AXpAXeOU6uJP1hqepO4A5vPzu8ff/Ht/wb4Ce4BgeFwKf4cnbAC7iix4q5ik64zzUaNwN/FNc68Pe4Zqhx5d0gFeHq3gDycDm1MtfgGhQsBgqA1wmfcytzBXCit+4fcHUqZUWpD+GCyTzveLcCl6jqrpicTC0pa+FhGjivmGMDcFkVF82EEpEXgTxVvS/RaTEgIo2BLbgGCN/75k8DblbV5QlLnImpI+mBMBNjIjIKmIG7w7oL14R1ZkITZY40twBf+wMFgKqeGWF9c4SyYNGwDccVq6TgsskXq2plxVDGHCAi+bgbjNGJTouJPyuGMsYYUyWr4DbGGFOlelMM1bp1a+3atWuik2GMMUeU2bNnb1PVKpvN15tg0bVrV3JzcxOdDGOMOaKIyJpo1rNiKGOMMVWyYGGMMaZKFiyMMcZUyYKFMcaYKsUtWIjIRBHZIiILIywXcaOj5YnIfBEZ4lt2vYis8F7XxyuNxhhjohPPnMVzwKhKlp+L69StJzAW13MlXv/+9+JGjjseuFdEsuKYTmOMMVWIW7BQ1S9wvVhGMhp4QZ0ZQAsROQo4Bzci1Q5VLcAN1VlZ0DHGGBNniXzOogMHj9iV782LNP8QIjIWlyuhc+fO8UmlqZP2lQTYV1LTweMgWYSsxmmoKjv2lhw0QtD6giK+W1vAOf3ac3SLzEr3Ewopc/N38k3eNkoC5UNZpKcm88OhHWnbLD6DoZUGQ7yem8+mwqK47N8cvvTUZE7q3opOLRsdNF8VVmzZzazvCwiGwg6hUm3tm2dy1bD4XgOP6IfyVHUCbkAacnJyrJOremr9ziL++WkeO/eV0jErk5Rk4akvvqckeHj/aMe0aUxxSZANhcVhl//lw2WcemwbkiT8oG3BkDJnXQGbd7m+F/2rqcK/Pl/JKT1bI3EY9HDZ5t3kbdlDhKSZOiCabvdi9f0N6tSiXgeL9Rw8iEpHb9563Khi/vmf1VqqTEyUBkN8u2oHa3bsDbu8JBDi67ztrNuxr8p9rd2xD0Xp0CKTjxZvojSo/CD7KIZ1a1nltpHsKwnyVd420lOSufGUY0hLLv+vbZaZSo+2TXj8kzxWbNlT6X4Gd8rinP7tOKNXO5o3Sj0wf9XWPTw0ZQnLN1e+fU01TkvmqetyOLtvu7js3xy+wn2lfL5iK4X7Sg5Z1rZZBqf0bE2jtCPnfj2uvc6KSFfgP6raP8yy83EjRp2Hq8x+TFWP9yq4Z+PGOgb4DhiqqpXVf5CTk6PW3UftKg2GyC8oLwYJqfLyt2t56ds17A+Eqryz6tAik/4dmlV5553VOJWfj+hBp5aN2FVcyrbd+zmmTZNYnIIxDZ6IzFbVnKrWi1tYE5FXcDmE1l6/9/cCqQCq+i9gCi5Q5OEGR7/BW7ZDRP6AG7cW4IGqAoWpvkUbCtm4s5iVW/fw7fc7KA2G6HtUM4Z0yWJ3cYCvVmylXfMMzunXnv5HN2fB+kKOap5xoAx/2579XPfMTBZvPHRkyIsHd6BjViYDOjQnu2MLksLFAoE2TdKRaubDm2Wk0iwjteoVjTExVW/Gs7CcRXiqSiCk3PLSdyxcX0irJukc1TyDjxZvPrBOz7ZNyEhNZsnGXQRC7vfQqnEahUWlBEJKSpIcmN++WQZJAruLA5SGQowb1ZsWjdIO7KtH2yb079C8dk/SGFNjCc9ZmMTaHwhyx6tzWbh+F6cd24aPFm/m/AFHsXlXMV+u2MbNI7pzbv/2tG6SfiC3UFhUytrt+0hPTaJHmybs3h/g06VbmJ9fyJAuLVi3o4hVW10ZfHKScPlxnRjS2R6BMaYhsJxFPbRuxz7ufGMeM1btoGlGCruLA4zq155/XTsUcLmN6hb/GGPqJ8tZNFDTlmzm1pfnIAKPXD6Q7I4tePrLVfzq7GMPrGOBwhhTXRYs6pH35m3gjlfn0vfoZjxxzVA6eMVLD1+aneCUGWOOdBYsjnBbdhdz28tzaN88g8nzNnBcl5Y886McmlqLIWNMDFmwOMJ9unQL336/g7TkJM7o1ZbHrxpCZlpyopNljKlnLFjUYWu372N/IEjPdk0jrjN7TQFZjVLJvftsksM+0GCMMYfPgkUdoqrsLQmyette7n9vEbNWF5CZmsyce84mIzWZ/IJ9/GNaHnPWFfDHS7IZ2iWL2WsKGNI5ywKFMSauLFjUAarKk1+s4uVv17LW6yupVeM0Ls/pyGu5+eSuLmB4z9bc9dYCZq3eQZP0FG5+cTYv3jSMlVv3csmQjgk+A2NMfWfBIsG27dnPXz9axisz13FS91aMOb4zjdOTuSD7aNJSknjru/V8lbeNlo3T+HLFNn47qhdn9G7LxeO/4fInpwMwtIs9GGeMiS8LFgk0ed4Gbp80B1X4+Yju3HlOr0OegRjSOYtvVm5jw84iGqUlc/XxXWjeKJWnrsvhJy/kkpwkZHe07jWMMfFlwSKBXv52DV1bNWb8VUPoe3SzsOuc1KMVj368gvn5hfzstO4HusEe3rM1b9x8IvkFRUdUN8fGmCNTPMfgNmEEQ8rHizezdfd+clcXMKp/+4iBAuCM3m0RgTHHd+LOc3odtKzf0c05p1/7eCfZGGMsZ1HbHpm6jPGfrqRn2yYEQsqIY9tUun52xxbMuOtM2jatfnfexhgTK5azqAXfrS0gEAzx2bItjP90JW2bprNiyx6aZqQwJIrK6XbNMixQGGMSyoJFnE1bsplL/vkNr8/O59VZ62jfLIMPfnkqHVpkcnafdqQm21dgjKn74loMJSKjgL8DycDTqvpwheVdgIlAG2AHcI2q5nvLgsACb9W1qnphPNMaD6GQ8ucPlwGuk7/5+YVcMPBoWjZO44NfnmKBwhhzxIjnsKrJwHjgbCAfmCUik1V1sW+1vwAvqOrzInIG8EfgWm9ZkaoOilf6asOUhRtZumk3Pds24ZuV2wEY0cvVUVhHf8aYI0k8b22PB/JUdZWqlgCTgNEV1ukLfOK9/zTM8iPa1MWbadM0nYcuGQBAarJwco/WCU6VMcZUXzyDRQdgnW8635vnNw+4xHt/MdBURFp50xkikisiM0TkojimMy5UlW9X7WBYt5YM7ZxFu2bpDOvWiibp1gDNGHPkSfSV6zfA4yLyI+ALYD0Q9JZ1UdX1InIM8ImILFDVlf6NRWQsMBagc+fOtZfqCKYu3sz4T/M4tWdrzh1wFJt2FTOsW0uSkoQXbxxGYwsUxpgjVDyvXuuBTr7pjt68A1R1A17OQkSaAJeq6k5v2Xrv7yoR+QwYDKyssP0EYAK4MbjjchZR2h8Ict/kRewqLmVe/k5en50PwLBjXEapsm7GjTGmrotnMdQsoKeIdBORNOBKYLJ/BRFpLSJlabgL1zIKEckSkfSydYCTAX/FeJ3zyrdrWb+ziPFXDeGnp3ZnY2ExWY1S6dGmSaKTZowxhy1uOQtVDYjIrcCHuKazE1V1kYg8AOSq6mRgBPBHEVFcMdQt3uZ9gCdFJIQLaA9XaEVVp6gqT3/1Pcd3bckpPVtzXNeWfLRoEwM6NifJxpkwxtQDcS1EV9UpwJQK8+7xvX8DeCPMdt8AA+KZtlhavHEX+QVF3HZGD0SEzLRk/vOL4TYgkTGm3rAa1xj4cNFmkgTO6tPuwDzrCdYYU5/YFe0wTFmwkT99sJTi0hA5XVvSqkl6opNkjDFxYf1NHIZnv/6e1dv3sWlXMaOsq3BjTD1mOYsayi/Yx6zVBfzq7GMZ2iWL47u1THSSjDEmbixY1NB78zYCcPHgDnRq2SjBqTHGmPiyYqgaenfueoZ0bmGBwhjTIFiwqIFlm3azdNNuRg+q2NWVMcbUTxYsamDyvPUkJwnnDTgq0UkxxphaYcGimlSVd+du4OQerWnT1JrKGmMaBqvgrqayp7V/cWbPRCfFxNPWZdDyGEhOhVAIti2HVt2htAg2zgN8/VY27wQtuyUsqYcIlsL2ldCmF+zbDoH90NyKTM3hsWBRTV/nbQPgtGPbJDglddSeLTB9PPS5APJzYdN8N1+S4Lgb4ejBNd93KATLpsCG7+CU30BalI0Ligvhiz/Dvh1Vr9v7fBcQ3rzRBYGup8DGubBlsZsu3gX7Cw/eJikVLnsG+tby2F3T/wmNWkH/S2H3Bpj1jPv8V38JheugZXcozIfkNBj9Dxfkev8AOuYc/rFDQfj6Ueh5DrTvf/j7M3WeqCa0Z++YycnJ0dzc3LjtX1UREa6fOJP1O4v4+Fenxe1YR4zdm+H7zyG1EfQ8G1LS4eP74atHytdpehQkpUDRTkBhzCTodkr0x9i/G776G+ROLN8HQIccKNkL25a5Y1zxopu/Pe/g7VXh2ydg43xodnTlxyrd5wJKaia06gEZzaFgNTRuA/0vgbxpkNkCBl3t1inb/ycPwroZIMlw4s9h5IPRn19NbZgLE8p+gwKo+5ybHgVZXeHYc2Dpf6F1T1gzHbavcKumNoKz7nfnIUnQ4yz3vrrmvgLv/AzSm8PZ90GzDtD9DBdUty6LzTlWJSnFHbMs/aqw7lvYuTaOBxU4ZoQ75vIP3W8mktRM9/mW/VZCIfj+M9i7LfbJymwJPc+q0aYiMltVq7yDsGARBVXlx8/NAmDGqh1cntOR+0cf4XdTpcXujjujWfmP2W/vNvj2Xy53kNYYBl/j7iKTkty2oVL413B3MQVo0h4ueRLevQ2yuri73bZ9ofMwt3zXRvj3RW79K150waVMMADJvkzuvh3w0d3QPhvmT3IXxt7nQ9s+7hUKwjs3Q5ve7qK44HW3/1Bp+HNNyYAfPg+9RlX9mbz+I1jzDfz0M1cMFY2SvTDraVg3E5b+B658BXqfF922NfXuLbDwLRj9OGxZ4oJA9hXhi5v2bIWFb0CXk+Cdn8PmheXL2vSG425y33P2D6H7mSBVdIAZKIHHh0JaUwgUww5vmJnMLCgqiN05RiO9GXQYAgjs3XrwucVL807Q+lhYOa3qdRu3gXbetWLn2vLPKtY65MBPokhPGBYsYmjaks3c+Hz5vidcO5SRse7eY810aNza3QnGUmkxLH8fep3n7vwBNsyBFy+DfdvcP9txN8IJP4cmbd3y/Fx48VIXTI4e5Io2dq13xSxtersinaxusGMVXPkSJKfDB+PcOqX74OInYeCVh6Zl73Z48WLYtKA8mCz/EPJnuel2/dx6819zd6io2/flLxx6oS8qgIwW7sJWuB7e/y10PgGOHeXumP0ys6BRlE/Yq7qLf3oNxiEJlMDTZ8C2PMj5MZx0a9W5mZooWA3jh8HAMXDBo9XbNlhafue9bTm8eROU7HHBpnQfHDXQfc+SHH57Dbkcy/pcuOZNV0xXmO8C1vxJ0PE491ur+B3Ew96truit7IYlOdXlALuNqDrg1VRhPrxxg7uhGfXwwTc9Fe1cU140CO6mbPC1XnCLsZR0aN6xRptasIiRUEg577EvKS4NctqxbXjzu/V8Pe4MmmemxvAgQfhzd3cH8qP/HN5+9u8+uFjhv7+BWU/BCbfAqIdg82KYeI670J78C1jzNSx6x/3YsrxK2oLV0LSdKzJq28fd+X/zGEy73y3vMtxd8E/8OYwY5+ZtXgQTRriLxJ15kB5hZMDiQvj8T5D7LJTudXdp3U6DRW+7aXBFG1e84HIEqZnuAnakKMyHaX9wuR1JcpXixPLCpa6oLSkFxn4ObXsf3u62LIW9W6DTMJj/Knz1aNV3v807w6m/gSHXxe+iXJcVrIFdG6DLiYlOSUxYsIiR6Su3M+apGfz1hwO5ZEgH9uwP0DQjhoEC3J3+hBHuAvCrJe5C0/8yd8GOZMEb7kLkrzB++2aXi/jpl9CiEyz7AF65wpUn71oPFz4OM55wF4exn5XfiWzLg5lPwu5NbjqjGZx+NzQ76tBjFq6Dk253F4mKF4pFb7t6hZwbqj7nYACCJS4gJCWVT4OrkE0+wtteFKyBb590n1estegMw37q/saaqqvgr0zZd2bqBQsWMXLXW/N5d+4GZt99NplpEbLmNfXdv2HOv10l3Wd/dPM6Hg/5M115+Vn3u4rWdn0P3m75R/DyD1156M3ToUkbd6f/r+FueeeTYMBl8MFdrmz1+snw74tdqx6Aq16HY0fG9lyMMUekaINFXG8PRGSUiCwTkTwRGRdmeRcRmSYi80XkMxHp6Ft2vYis8F7XxzOdkZQEQkxZsIlz+rWPfaAo3gVT73GtN758BFr3gkatXaA4apBrH//atfDM2a54qczeba5ys2V3t483f+wqZKfc6VrvnPNHWPsN/PdXLshcP9mV1//kE7jsWbjoCQsUxphqi1teX0SSgfHA2UA+MEtEJlcYS/svwAuq+ryInAH8EbhWRFoC9wI5uLaSs71ta7WpxX8XbKCwqJQLB8WhknLGP6FohwsS25ZB99NdfcPcl+DCx1xZ/syn4LOHXB1Cq+6uiOC926F4J1z7tnve4L+/gWfPdW39z/8rDL3etRDav9tVIKekueMlJbvKP2OMqYF4FgwfD+Sp6ioAEZkEjAb8waIv8Cvv/afAO977c4CpqrrD23YqMAp4JY7pPcjUxZv57RvzGdChOcN7tI7tzvduh28edw+uDf8VTBzlmoa27A59Liyv0O1+hgsWZU8Pz5vkmmaOfNA9CNW+P/QcCXkfu3XLWt606h7b9BpjGrx4BosOgL92Lx8YVmGdecAlwN+Bi4GmItIqwraHNCAXkbHAWIDOnWNX2VcaDPH7txfQq31TXrxpGKnJMS6t++oR1/Ln9Ltda5a78stzAP528m2OdX+3LnUPAk273zVNPOGW8nWatnfPQBhjTBwluknDb4DTRGQOcBqwHghWvkk5VZ2gqjmqmtOmTey63/hk6Ra27t7P7WceG9smsgA7vnfFS9lXljd7LAsUFWU0d0/kbl3uHvravRHOus9aohhjal08rzrrgU6+6Y7evANUdYOqXqKqg4H/583bGc228TRp5lraNUvn9F7VDEDbV7p+kSK1MAsF4e2fuaaHZ/y/6PbZ+ljX0umrR+GY06Hr8OqlyRhjYiCewWIW0FNEuolIGnAlMNm/goi0FjnwqOddwETv/YfASBHJEpEsYKQ3L+4K9pbw+fKtXDa0IynVLX6a8QR8+PvyvnEWvQNPnuae7AXXv9G6GXD+X6J/2rJNL9i8wD1tffLt1UuPMcbESNyChaoGgFtxF/klwGuqukhEHhCRC73VRgDLRGQ50A74X2/bHcAfcAFnFvBAWWV3vC3euIuQwonH1KBSe+0M93f5++7v95+7ZxvWfuOm5/wbjh4CA34Y/T7b9HJ/W/V09RbGGJMAcX1MVlWnAFMqzLvH9/4N4I0I206kPKdRa5Zu2g1Ar/YRuquIpLiwvBOzZR/A8DvK+6xZ9gE0Pdp1EX3OH6vXRUJb74G8425qmF0rGGPqhCO8T4XYW7ZpF60ap1V/FLx1swCFzie6B+32bi8PFsvfd11oINV/1qHziXDly66JrDHGJIg1q6lg2abd1c9VAKyd7nrqPONu1zNn3lTYuQ4at3VB48tH4JjTXFPX6hBxz2Akx7hVljHGVIMFC59QSFm+eU8Ng8UMOCrb5QRSG8OS99z4CsePdd16D74GLvpX7BNtjDG1wIqhfNbu2EdRaZDe0QaLLUtcT6un/BrWz3a9rSYluyewV0x163TMgdPujF+ijTGmFliw8Cmv3G4W3QYf3wfLP3CDBgWK3OA74AY3KWsBldU15uk0xpjaZsVQPut3un78u7RsVPXVwwf4AAAgAElEQVTKBavdKG/gBvMB6FQWLIa6v5Jc49GrjDGmLrFg4VNc6noaaZQeRXfks57xRkLrAXs2u1HmygYrKhs2sXkHq5g2xtQLFix8ikuDJAmkRfPk9uJ3XXPWQVe56c6+IRZbdIFGrdxfY4ypByxY+BSVBMlMTUaqevht1wY3GHu3U6HX+YC492VE4Nw/uYpvY4ypB6yC26eoNBjdiHhl3Xp0PsH1HHvLTFcc5Tfgstgn0BhjEsSChU9RaZD0lCiDRWpjaJ/tpsvGnTDGmHrKiqF89peGosxZTHfPTyRbrDXGNAwWLHyKSl2dReUrFbgOA8ueqTDGmAagymDhDXPaIJRVcFdq+njX91OfC2onUcYYUwdEk7OYISKvi8h5UmUzoSNbUWmQ9NQIH8muDbDsfZj+T+h3MbQfULuJM8aYBIomWBwLTACuBVaIyEMiUi9rdIsjFUOpwouXwitXQnA/nB7lkKjGGFNPVBks1JmqqmOAnwDXAzNF5HMRObGybUVklIgsE5E8ERkXZnlnEflUROaIyHwROc+b31VEikRkrveqle5aiyM1nd28ELYshlPvdM1kW/esjeQYY0ydUWVzHq/O4hpczmIzcBtuLO1BwOtAtwjbJQPjgbOBfGCWiExW1cW+1e7GDbf6hIj0xY2q19VbtlJVB9XkpGoqYgX3gtchKQWG3QyNG0wVjjHGHBBN28/pwL+Bi1Q13zc/t4o7/uOBPFVdBSAik4DRgD9YKFDWxWtzYEO0CY+HopIgGRWDRSgEC96E7mdaoDDGNFjRBIteqqrhFqjq/1WyXQdgnW86HxhWYZ37gI9E5DagMXCWb1k3EZkD7ALuVtUvKx5ARMYCYwE6d+5cxWlUrbg0dGiw2J4Hu/JhxCGlaMYY02BEU8H9kYi0KJsQkSwR+TBGxx8DPKeqHYHzgH+LSBKwEeisqoOBXwEvi8ghg0yo6gRVzVHVnDZt2hxWQoIhpSQYOrQYautS97ddv8PavzHGHMmiCRZtVHVn2YSqFgBto9huPdDJN93Rm+d3I/Cat9/pQAbQWlX3q+p2b/5sYCWuVVbclHVPnplW4SPZtsz9bV0vG4AZY0xUogkWQRE5UMYjIl1wdQ1VmQX0FJFuIpIGXImrGPdbC5zp7bcPLlhsFZE2XgU5InIM0BNYFcUxa6zICxaHFENtXQbNO0F6k3ge3hhj6rRo6iz+H/CViHwOCHAKXj1BZVQ1ICK3Ah8CycBEVV0kIg8Auao6Gfg18JSI3IELQD9SVRWRU4EHRKQUCAE/U9UdNTnBaBWVVBIsLFdhjGngqgwWqvqBiAwByjpD+qWqbotm56o6Bdcc1j/vHt/7xcDJYbZ7E3gzmmPEyv6AVwzlDxahEGxbAV1Pqc2kGGNMnRNtt6lBYAuumKiviKCqX8QvWbWvqCQEVAgWhWshUGRdkBtjGrxoHsq7CbgdV0E9F5fDmA6cEd+k1a6wdRZbl7u/rXslIEXGGFN3RFPBfTtwHLBGVU8HBgM7K9/kyFMUrjXUordBkqCNBQtjTMMWTbAoVtViABFJV9WlQL27ehZXzFkseQ/mvQzD74BGLROYMmOMSbxo6izyvYfy3gGmikgBsCa+yap9B56zSE12vcxOvRfaDYDT7MltY4yJpjXUxd7b+0TkU1wfTh/ENVUJcFDT2Q1zYMdKuPAfkJKW4JQZY0ziVRosvAfjFqlqbwBV/bxWUpUARf6cxYI3IDnNRsMzxhhPpXUWqhoElvmf4K6vDgSLFGDRW9BzJGRmJTZRxhhTR0RTZ5EFLBKRmcDespmqemHcUpUAxaXuOYv03Wtg90bodW6CU2SMMXVHNMHif+KeijqgbEhV2fG9m9HKRsMzxpgy0dRZ3Oc9X1GvuYGPkmCH119hy2MSmyBjjKlDoqmzCIlI81pKT8IcGFJ1xypIawqNWyc6ScYYU2dEUwy1B1ggIlM5uM7iF3FLVQIUlwbJSPOCRctuIJLoJBljTJ0RTbB4y3vVa8X+nEX7AYlOjjHG1CnRPJT3fG0kJNGKSoM0SgG2rYG+oxOdHGOMqVOi6XX2e8KMjKeq9aoGeH9piA5sg1DAKreNMaaCaDoSzMH1OnscbpS8x4AXo9m5iIwSkWUikicih3SyJCKdReRTEZkjIvNF5Dzfsru87ZaJyDnRnU7NBULK0brBTbTsFu/DGWPMEaXKYKGq232v9ar6KHB+Vdt5zW7HA+cCfYExItK3wmp3A6+p6mDcGN3/9Lbt6033A0YB/ywbkzteAqEQ7UNb3ERW13geyhhjjjjRFEMN8U0m4XIa0VSMHw/kqeoqbz+TgNHAYt86CjTz3jcHvFt7RgOTVHU/8L2I5Hn7mx7FcWskEFQaJe1zE9bNhzHGHCSai/5ffe8DwPfA5VFs1wFY55vOB4ZVWOc+4CMRuQ1oDJzl23ZGhW07VDyAiIwFxgJ07nx43VcFQ0qjpGI3kZJ5WPsyxpj6JprWUPF8ensM8Jyq/lVETgT+LSL9o91YVScAEwBycnIOqYSvjkBISdcSFyiSoqnKMcaYhqPKq6KIPOQNflQ2nSUiD0ax7/VAJ990R2+e343AawCqOh3IAFpHuW1MBUIhMtgPqZarMMaYiqK5hT5XVQ+Mua2qBcB5laxfZhbQU0S6iUgarsJ6coV11gJnAohIH1yw2Oqtd6WIpItIN6AnMDOKY9ZYIKik6X5IbRTPwxhjzBEpmjqLZG/s7f0AIpIJpFe1kaoGRORW4EMgGZioqotE5AEgV1UnA78GnhKRO3CV3T9SVcV1if4arjI8ANzi9VMVN64YynIWxhgTTjTB4iVgmog8603fAET1VLeqTgGmVJh3j+/9YuDkCNv+L/C/0RwnFgLBEOlabMHCGGPCiKaC+/9EZB7lLZX+oKofxjdZtS8QUtK02IqhjDEmjGies+gGfKaqH3jTmSLSVVVXxztxtSkQVNJC+yGt3vfGbowx1RZNBffrQMg3HfTm1StBy1kYY0xE0QSLFFUtKZvw3qfFL0mJURoKkRqyCm5jjAknmmCxVUQuLJsQkdHAtvglqfaFQooqpIasgtsYY8KJpjXUz4CXRORxQHBdeFwX11TVstKQK2VLDRZZMZQxxoQRTWuolcAJItLEm94jIu3inrJaFAy5nkJSLGdhjDFhVacTpBTgChGZBsyJU3oSIhBSUgiQrAFIbZzo5BhjTJ1Tac7Ce1p7NHAVMBhoClwEfBH/pNWeQFDJwKvDt5yFMcYcImLOQkReBpYDZwP/ALoCBar6maqGIm13JAqEQmRasDDGmIgqK4bqCxQAS4AlXt9Mh9UNeF0VCCoZst9NWAW3McYcImKwUNVBuEGOmgIfi8hXQNP6VrkN3sBHlAULy1kYY0xFlVZwq+pSVb1XVXsDt+M6EJwlIt/USupqSWnQXwxlOQtjjKkomucsAFDV2cBsEbkTOCV+Sap9wZCSWVYMlWbBwhhjKoo6WJTxxpuoV62hSq01lDHGVMoGm8bLWWAV3MYYE0lcg4WIjBKRZSKSJyLjwiz/m4jM9V7LRWSnb1nQt6zicKwxVWpNZ40xplJRF0OJyAnAfbhxsh9V1XeqWD8ZGI97TiMfVzE+2RsdDwBVvcO3/m24B//KFHktsuLuoDoLy1kYY8whKnsor32FWb8CLgbOA/4Qxb6PB/JUdZXXrfkk3NPgkYwBXolivzHnWkNZ01ljjImksmKof4nIPSKS4U3vBC7DBYxdUey7A66H2jL53rxDiEgXoBvwiW92hojkisgMEbkownZjvXVyt27dGkWSwnN1FtZ01hhjIqnsobyLcB0G/kdErgN+CaQDrXD9Q8XSlcAb3lPiZbqoag6uX6pHRaR7mDROUNUcVc1p06ZNjQ8eCLpiqFByGiQl13g/xhhTX1X1UN57wDlAc+BtYLmqPqaq0dzGrwc6+aY7evPCuZIKRVCqut77uwr4jIPrM2IqEHJNZ0MpVgRljDHhVFZncaGIfAp8ACwErgBGi8ikcHf5YcwCeopINxFJwwWEQ1o1iUhvIAuY7puXJSLp3vvWwMnA4orbxkowFKIR+1ELFsYYE1ZlraEexFVSZwIfqurxwK9FpCfwv7iLf0SqGhCRW4EPgWRgoqouEpEHgFxVLQscVwKTvIf9yvQBnhSREC6gPexvRRVrpV4xlAULY4wJr7JgUQhcAjQCtpTNVNUVVBEofOtOAaZUmHdPhen7wmz3DTAgmmPEQjCkNKYEtZZQxhgTVmV1FhfjKrNTcJXM9VZpMEQG+yHFWkIZY0w4EXMWqroNN+hRvRcMKRlSAqnNE50UY4ypk6xvKKA0pKQShOS0RCfFGGPqJAsWQDAYIoUgkpya6KQYY0ydZMEC95yFBQtjjInMggUuWKQSsGBhjDERWLAAAsGQFyyszsIYY8KxYIFXDCVBklIsZ2GMMeFYsMB1JJhqdRbGGBORBQvKK7hJsmBhjDHhWLDAdSTonrOwYGGMMeFYsMB1JJgiAUiKepRZY4xpUCxY4Lr7sCe4jTEmMgsWQDAYIJmQFUMZY0wEFiyAUKDUvbFiKGOMCcuCBaAhL1hYzsIYY8KKa7AQkVEiskxE8kRkXJjlfxORud5ruYjs9C27XkRWeK/r45lODQbcG2s6a4wxYcWt3EVEkoHxwNlAPjBLRCb7h0dV1Tt8698GDPbetwTuBXIABWZ72xbEJbGB/e6v5SyMMSaseOYsjgfyVHWVqpYAk4DRlaw/BnjFe38OMFVVd3gBYiowKm4pDXk5CwsWxhgTVjyDRQdgnW8635t3CBHpAnQDPqnOtiIyVkRyRSR369atNU6oBkrcGyuGMsaYsOpKBfeVwBuqGqzORqo6QVVzVDWnTZs2NT+65SyMMaZS8QwW64FOvumO3rxwrqS8CKq62x6+oDWdNcaYysQzWMwCeopINxFJwwWEyRVXEpHeQBYw3Tf7Q2CkiGSJSBYw0psXH5azMMaYSsXtVlpVAyJyK+4inwxMVNVFIvIAkKuqZYHjSmCSqqpv2x0i8gdcwAF4QFV3xCutEvLqLKy7D2OMCSuu5S6qOgWYUmHePRWm74uw7URgYtwS5yNlOQsrhjLGmLDqSgV3YgXtCW5jjKmMBQtAyrr7sKazxhgTlgULQNQquI0xpjIWLAAJWp2FMcZUxoIFkKRldRbWGsoYY8KxYIGvNZQVQxljTFgWLIAkazprjDGVsmABiFrTWWOMqYwFC/w5CwsWxhgTjgULrOmsMcZUxYIFkGLBwhhjKtXgg4WqkqxWDGWMMZVp8MEiEFJS8MZcspyFMcaE1eCDRTCkpEgARSApOdHJMcaYOqnBP1hQGgyRSpCQpGChwpjDU1paSn5+PsXFxYlOiqkgIyODjh07kppasxKUBh8sgl4xVCjJgoUxhys/P5+mTZvStWtXRCTRyTEeVWX79u3k5+fTrVu3Gu0jrsVQIjJKRJaJSJ6IjIuwzuUislhEFonIy775QRGZ670OGY41VoIhpXFyiJBVbhtz2IqLi2nVqpUFijpGRGjVqtVh5fjilrMQkWRgPHA2kA/MEpHJqrrYt05P4C7gZFUtEJG2vl0UqeqgeKWvTKsm6VyVcxQsTY/3oYxpECxQ1E2H+73EM2dxPJCnqqtUtQSYBIyusM5PgPGqWgCgqlvimJ7IQqXWbNYYYyoRz2DRAVjnm8735vkdCxwrIl+LyAwRGeVbliEiud78i+KYTggGILnBV98Yc8Tbvn07gwYNYtCgQbRv354OHTocmC4pKYlqHzfccAPLli2rdJ3x48fz0ksvxSLJDB8+nF69epGdnU3v3r257bbbKCwsrHSbUCjEww8/HJPjRyvRV8gUoCcwAugIfCEiA1R1J9BFVdeLyDHAJyKyQFVX+jcWkbHAWIDOnTvXPBWWszCmXmjVqhVz584F4L777qNJkyb85je/OWgdVUVVSUoKf6/87LPPVnmcW2655fAT6/Pqq68eCGi//e1vueSSS5g2bVrE9cuCxbhxYauC4yKewWI90Mk33dGb55cPfKuqpcD3IrIcFzxmqep6AFVdJSKfAYOBg4KFqk4AJgDk5ORojVMaLLGBj4yJsfvfW8TiDbtius++Rzfj3gv6VXu7vLw8LrzwQgYPHsycOXOYOnUq999/P9999x1FRUVcccUV3HPPPYC703/88cfp378/rVu35mc/+xnvv/8+jRo14t1336Vt27bcfffdtG7dml/+8pcMHz6c4cOH88knn1BYWMizzz7LSSedxN69e7nuuutYsmQJffv2ZfXq1Tz99NMMGhS5KjYtLY2//OUvHHPMMSxatIh+/fpxwQUXsGHDBoqLi7njjju46aabGDduHLt372bQoEFkZ2fzwgsvhF0vluJZDDUL6Cki3UQkDbgSqNiq6R1crgIRaY0rllolIlkiku6bfzKwmHixYihj6r2lS5dyxx13sHjxYjp06MDDDz9Mbm4u8+bNY+rUqSxefOglprCwkNNOO4158+Zx4oknMnHixLD7VlVmzpzJn//8Zx544AEA/vGPf9C+fXsWL17M//zP/zBnzpyo0pmSkkJ2djZLly4F4Pnnn2f27NnMmjWLRx55hIKCAh5++GGaNm3K3LlzeeGFFyKuF0txu0KqakBEbgU+BJKBiaq6SEQeAHJVdbK3bKSILAaCwJ2qul1ETgKeFJEQLqA97G9FFXNWDGVMzNUkBxBP3bt3Jycn58D0K6+8wjPPPEMgEGDDhg0sXryYvn37HrRNZmYm5557LgBDhw7lyy+/DLvvSy655MA6q1evBuCrr77id7/7HQADBw6kX7/oPw/V8oKSv/3tb0ye7O6z8/PzWblyZdjcSbj1/Od7uOJ6O62qU4ApFebd43uvwK+8l3+db4AB8UzbQYKl1i+UMfVc48aND7xfsWIFf//735k5cyYtWrTgmmuuCfsMQlpaefF0cnIygUAg7L7T09OrXCdagUCAhQsX0qdPHz7++GO++OILZsyYQWZmJsOHDw+bzmjXOxwNvm8oAEIBy1kY04Ds2rWLpk2b0qxZMzZu3MiHH34Y82OcfPLJvPbaawAsWLAgbDFXRSUlJfzud7+jR48e9O3bl8LCQlq2bElmZiaLFi1i1qxZgCuqAg4EpkjrxZIV1IPLWaQ1SnQqjDG1ZMiQIfTt25fevXvTpUsXTj755Jgf47bbbuO6666jb9++B17NmzcPu+4VV1xBeno6+/fvZ+TIkbz11lsAnH/++UyYMIG+ffvSq1cvhg0bdmCbG2+8kezsbHJycpgwYULE9WJF/GVjR7KcnBzNzc2t2cZPngZN2sLVr8c2UcY0MEuWLKFPnz6JTkadEAgECAQCZGRksGLFCkaOHMmKFSsO5AoSIdz3IyKzVbXKyg3LWYAVQxljYm7Pnj2ceeaZBAIBVJUnn3wyoYHicB25KY+lYKk1nTXGxFSLFi2YPXt2opMRM1bBDdZ01hhjqmDBAryH8ixYGGNMJBYswMtZWDGUMcZEYsECrG8oY4ypggULsGIoY+qJ008//ZAH7B599FFuvvnmSrdr0qQJABs2bOCyyy4Lu86IESMI1zx/xIgRB3Uxfuutt7Jz584q0/rQQw9VuU5dYsECrBjKmHpizJgxTJo06aB5kyZNYsyYMVFtf/TRR/PGG29U+7gvvfQS8+fPZ/78+aSnpzN6dMVx3g51pAULu0KC9Q1lTDy8Pw42LYjtPtsPgHMjD/pz2WWXcffdd1NSUkJaWhqrV69mw4YNnHLKKezZs4fRo0dTUFBAaWkpDz744CEX9dWrV/ODH/yAhQsXUlRUxA033MC8efPo3bs3RUVFVSYvLS2NP/3pT/To0YN58+YxcOBALrroItatW0dxcTG33347Y8eOZdy4cRQVFTFo0CD69evHSy+9FHa9usSChao1nTWmnmjZsiXHH38877//PqNHj2bSpElcfvnliAgZGRm8/fbbNGvWjG3btnHCCSdw4YUXRhyb+oknnqBRo0YsWbKE+fPnM2TIkKjSkJyczMCBA1m6dCkDBw5k4sSJtGzZkqKiIo477jguvfRSHn74YR5//PEDAzUBYddr1apVTD6XWLBgEQq6v5azMCa2KskBxFNZUVRZsHjmmWcA1+3373//e7744guSkpJYv349mzdvpn379mH388UXX/CLX/wCgOzsbLKzs6NOg78bpccee4y3334bgHXr1rFixYqwQSDa9RLFgkXQG5fXgoUx9cLo0aO54447+O6779i3bx9Dhw4FXL3C1q1bmT17NqmpqXTt2jXm3XgDBINBFixYQJ8+ffjss8/4+OOPmT59Oo0aNWLEiBFhjxnteolkFdyhUvfXiqGMqReaNGnC6aefzo9//OODKrYLCwtp27YtqampfPrpp6xZs6bS/Zx66qm8/PLLACxcuJD58+dXeezS0lLuuusuOnXqRHZ2NoWFhWRlZdGoUSOWLl3KjBkzDqybmppKaWnpgbRFWq+usGAR9AYqsZyFMfXGmDFjmDdv3kHB4uqrryY3N5cBAwbwwgsv0Lt370r3cfPNN7Nnzx769OnDPffccyCHEs7VV19NdnY2/fv3Z+/evbz77rsAjBo1ikAgQJ8+fRg3bhwnnHDCgW3Gjh1LdnY2V199daXr1RVx7aJcREYBf8cNq/q0qh5SiCkilwP3AQrMU9WrvPnXA3d7qz2oqs9Xdqwad1FetBPeux2GXAs9zqr+9saYA6yL8rqtTnZRLiLJwHjgbCAfmCUik/1jaYtIT+Au4GRVLRCRtt78lsC9QA4uiMz2to3tCOQAmS3g8krjkDHGNHjxLIY6HshT1VWqWgJMAio+qfITYHxZEFDVLd78c4CpqrrDWzYVGBXHtBpjjKlEPINFB2Cdbzrfm+d3LHCsiHwtIjO8Yqtot0VExopIrojkbt26NYZJN8bUVH0ZfbO+OdzvJdEV3ClAT2AEMAZ4SkRaRLuxqk5Q1RxVzWnTpk2ckmiMiVZGRgbbt2+3gFHHqCrbt28nIyOjxvuI53MW64FOvumO3jy/fOBbVS0FvheR5bjgsR4XQPzbfha3lBpjYqJjx47k5+djOf26JyMjg44dO9Z4+3gGi1lATxHphrv4XwlcVWGdd3A5imdFpDWuWGoVsBJ4SESyvPVG4irCjTF1WGpqKt26dUt0MkwcxC1YqGpARG4FPsQ1nZ2oqotE5AEgV1Une8tGishiIAjcqarbAUTkD7iAA/CAqu6IV1qNMcZULq7PWdSmGj9nYYwxDVi0z1kkuoLbGGPMEaDe5CxEZCtQeWcv4bUGtsU4OUcq+ywc+xzK2Wfh1OfPoYuqVtmctN4Ei5oSkdxosmANgX0Wjn0O5eyzcOxzsGIoY4wxUbBgYYwxpkoWLGBCohNQh9hn4djnUM4+C6fBfw4Nvs7CGGNM1SxnYYwxpkoWLIwxxlSpQQcLERklIstEJE9ExiU6PbVJRFaLyAIRmSsiud68liIyVURWeH+zqtrPkUhEJorIFhFZ6JsX9tzFecz7jcwXkSGJS3lsRfgc7hOR9d7vYq6InOdbdpf3OSwTkXMSk+rYE5FOIvKpiCwWkUUicrs3v8H9JirTYIOFbyS/c4G+wBgR6ZvYVNW601V1kK/9+Dhgmqr2BKZ50/XRcxw6mFakcz8X1xNyT2As8EQtpbE2PEf4QcX+5v0uBqnqFADvf+NKoJ+3zT+9/6H6IAD8WlX7AicAt3jn2xB/ExE12GBBdCP5NTSjgbIxZp8HLkpgWuJGVb8AKnZMGencRwMvqDMDaCEiR9VOSuMrwucQyWhgkqruV9XvgTzc/9ART1U3qup33vvdwBLcYGsN7jdRmYYcLKIaja8eU+AjEZktImO9ee1UdaP3fhPQLjFJS4hI594Qfye3esUrE31FkQ3icxCRrsBg4FvsN3GQhhwsGrrhqjoEl6W+RURO9S9U16a6QbarbsjnjitS6Q4MAjYCf01scmqPiDQB3gR+qaq7/Msa+G8CaNjBIpqR/OotVV3v/d0CvI0rUthclp32/m5JXAprXaRzb1C/E1XdrKpBVQ0BT1Fe1FSvPwcRScUFipdU9S1vtv0mfBpysDgwkp+IpOEq7yYnOE21QkQai0jTsve4kQgX4s7/em+164F3E5PChIh07pOB67wWMCcAhb6iiXqnQtn7xbjfBbjP4UoRSfdGv+wJzKzt9MWDiAjwDLBEVR/xLbLfhE88h1Wt0yKN5JfgZNWWdsDb7n+EFOBlVf1ARGYBr4nIjbju3i9PYBrjRkRewY3x3lpE8oF7gYcJf+5TgPNwFbr7gBtqPcFxEuFzGCEig3BFLquBnwJ4o1y+BizGtR66RVWDiUh3HJwMXAssEJG53rzf0wB/E5Wx7j6MMcZUqSEXQxljjImSBQtjjDFVsmBhjDGmShYsjDHGVMmChTHGmCpZsDCmCiIS9PXCOjeWPRSLSFd/r6/G1FUN9jkLY6qhSFUHJToRxiSS5SyMqSFvTJA/eeOCzBSRHt78riLyidcZ3zQR6ezNbycib4vIPO91krerZBF5yhtL4SMRyfTW/4U3xsJ8EZmUoNM0BrBgYUw0MisUQ13hW1aoqgOAx4FHvXn/AJ5X1WzgJeAxb/5jwOeqOhAYApT1GNATGK+q/YCdwKXe/HHAYG8/P4vXyRkTDXuC25gqiMgeVW0SZv5q4AxVXeV1RLdJVVuJyDbgKFUt9eZvVNXWIrIV6Kiq+3376ApM9QbYQUR+B6Sq6oMi8gGwB3gHeEdV98T5VI2JyHIWxhwejfC+Ovb73gcpr0s8Hzea4xBglohYHaNJGAsWxhyeK3x/p3vvv8H1YgxwNfCl934acDO4YX1FpHmknYpIEtBJVT8Ffgc0Bw7J3RhTW+xOxZiqZfp6IwX4QFXLms9mich8XO5gjDfvNuBZEbkT2Ep5r6S3AxO8XkyDuMARqWvrZOBFL6AI8Jiq7ozZGRlTTVZnYUwNeXUWOaq6LdFpMVfcwjgAAAA3SURBVCberBjKGGNMlSxnYYwxpkqWszDGGFMlCxbGGGOqZMHCGGNMlSxYGGOMqZIFC2OMMVX6/5aBRfUlWnRaAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x103fa27b8"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training Accuracy 99.83291562238931% and Valid Accuracy: 91.4572864321608%\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x115316d30"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training Accuracy 99.83291562238931% and Valid Accuracy: 91.95979899497488%\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x11498b2b0"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training Accuracy 99.83291562238931% and Valid Accuracy: 90.7035175879397%\n"
]
}
],
"source": [
"lams = [1e-3, 1e-8, 1e-6, 1e-7]\n",
"for lam in lams:\n",
" textModel2 = LogReg(X_train, y_train, X_valid, y_valid)\n",
" textModel2.train(eta=0.43, lam=lam, num_epochs=10)\n",
" train_acc2 = textModel2.accuracy(X_train, y_train)\n",
" valid_acc2 = textModel2.accuracy(X_valid, y_valid)\n",
"\n",
" plt.plot(range(1, len(textModel2.train_history)+1), textModel2.train_history, label='Training Data')\n",
" plt.plot(range(1, len(textModel2.valid_history)+1), textModel2.valid_history, label='Valid Data')\n",
" plt.ylabel('% Accurracy')\n",
" plt.xlabel('Epochs')\n",
" plt.title('Training Examples vs Accurracy, lam=' + str(lam))\n",
" plt.legend()\n",
" plt.show()\n",
" print('Training Accuracy ' + str(train_acc2*100) + '% and Valid Accuracy: ' + str(valid_acc2*100) +'%')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### The best `lambda` value ended up being equal to `1e-6` "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**Part F**: Finally, go back to your LogReg class and complete the `best_text_features` function to print the 10 best predictive words for each class. Show your results here and also **briefly** explain mathematically how you arrived at them. Do they seem to make sense given what you know about baseball and hockey? "
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"scrolled": true
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"best words for class 0\n",
"----------------------\n",
"pitching\n",
"morris\n",
"runs\n",
"cubs\n",
"better\n",
"clemens\n",
"yankees\n",
"prime\n",
"hit\n",
"majors\n",
"\n",
"best words for class 1\n",
"----------------------\n",
"biggest\n",
"whos\n",
"sanderson\n",
"playoffs\n",
"nhl\n",
"ice\n",
"playoff\n",
"leafs\n",
"penguins\n",
"hockey\n"
]
}
],
"source": [
"bestWords = LogReg(X_train, y_train, X_valid, y_valid)\n",
"bestWords.train(eta=0.43, lam=1e-6, num_epochs=10)\n",
"bestWords.best_text_features(vocab)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###### Since our betas represent our learned weights as we train over the data, we know that our heighest beta values correlate to how strong that word is against the prediction of it being $y=1$ or correlated with Hockey. We also know that our smallest beta values correlate to how strong words are against being correlated to Baseball or $y=0$"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}

More products