Homework 3 Deep Learning

Your shopping cart is empty.

1 Programming (80 points)
1.1 Overview
In this assignment you will learn to use PyTorch (http://pytorch.org/) to train classiﬁers to recognize images of simple drawings using a subset of the Google QuickDraw dataset (https://quickdraw.withgoogle.com/). Your grade will be based on a mixture of a) completing certain necessary tasks, as outlined below, and b) achieving high accuracy on a test set. We will evaluate your accuracy on Kaggle, using test data for which you do not have the labels. Your grade will depend on the accuracy of your trained models. However, your grade will not depend on the classiﬁcation accuracies achieved by your peers. Instead, we will assign some points based on previously-determined accuracy thresholds (more details below). Everyone has the opportunity to get full points on this project.
1.2 Requirements
You will need Python 3.5 or Python 3.6 and you will need access to a Linux or MacOS machine: PyTorch does not support Windows. (Strictly speaking, you might be able to get PyTorch up and running on Windows, but we have not done this and cannot help with this.) The CS department labs and servers oﬀer support for Python3. Install PyTorch using pip3.
1.2.1 Documentation
The documentation can be found at http://pytorch.org/docs/stable/index.html, and there are plenty more resources available online.
1.2.2 Introduction to PyTorch
While it is not required and there is nothing to turn in for this section, we strongly recommend that you go through the tutorial here to learn about pytorch: http://pytorch. org/tutorials/beginner/deep_learning_60min_blitz.html. • You may not have access to a GPU or a CUDA-enabled environment. This is okay; just modify these parts of the tutorial to run on the CPU, which should be ﬁne since the tutorial is not computationally heavy.
Fall 2018 CS 475 Machine Learning: Homework 3 2
• The point of this tutorial is to familiarize you with PyTorch, and in particular to familiarize you with the torch.nn and torch.optim packages. We suggest that you go through the examples from scratch, modifying parts whenever you are confused. If you simply copy and paste the code from the tutorial, you will likely have a much tougher time with the rest of this project.
1.3 Data
The QuickDraw dataset comes from a Google project where participants from around the world are asked to sketch an object in a short period of time, with the system attempting to guess what was drawn as the participant draws. To play the game, or to check out the data, visit https://quickdraw.withgoogle.com/. In this project we’ll keep things small (in terms of storage, memory, and computation) by dealing with only a small subset of the QuickDraw dataset. We’ve provided you with 5 classes of sketch data: apples, baseballs, cookies, clocks, and fans. There are 10,000 examples in each class. You can choose how much data you would like to allocate to the validation set. You can also leave out a test set if you’d like to, but here the true test set is supplied separately on Kaggle, so you can use all 50,000 images for training and validation. You can decide the most eﬀective way to use this data. The data is available for download on the class website under “homeworks“. The data is supplied in images.npy and labels.npy from the course website. The ﬁrst ﬁle contains a 3-D uint8 NumPy array with shape [50000, 26, 26]. This corresponds to 50,000 images, each 26 pixels x 26 pixels. The second ﬁle contains a 1-D uint8 array with shape [50000]. This corresponds to 50,000 labels, where the i-th label is associated with the i-th image. We recommend reshaping your input images consistently. For example, below we assume that each image is a vector rather than an image. Numpy implements this functionality in the flatten function (see https://docs.scipy.org/doc/numpy-1.15.0/ reference/generated/numpy.ndarray.flatten.html). Note that you would apply this function to each individual image, not the entire matrix of images (i.e. it would result in a [50000, 676] matrix). However, this is up to you; the only requirement is that your models handle the input data appropriately. In addition, you should leave all labels intact. In particular, you will need to predict 0, 1, 2, 3, or 4, with each integer corresponding to a particular class. You may not use any data except the data provided by us, i.e. do not download additional data from the web.
1.3.1 Introduction to the Data 1. While it is not required and there is nothing to turn in for this section, we strongly recommend that you go through the tutorial here to get familiar with the data. In a new notebook, run the following in the ﬁrst cell. (The ﬁrst line makes plots show up directly in your notebook.)
%matplotlib inline import matplotlib.pyplot as plt import numpy as np
2. Download images.npy and labels.npy from the course website. The ﬁrst ﬁle contains a 3-D uint8 NumPy array with shape [50000, 26, 26]. This corresponds to 50000 images, each 26 pixels x 26 pixels. The second ﬁle contains a 1-D uint8
Fall 2018 CS 475 Machine Learning: Homework 3 3
array with shape [50000]. This corresponds to 50000 labels, where the i-th label is associated with the i-th image. 3. Load the images and labels:
images = np.load(PATH_TO_IMAGES) labels = np.load(PATH_TO_LABELS)
4. Separate out class 0 and visualize the ﬁrst image:
class_0_images = images[labels == 0] plt.imshow(class_0_images[0]) plt.set_cmap(’gray’)
5. Repeat this for classes 1, 2, 3, and 4. Think about which integer is associated with the apple class? The baseball class? The clock class? The cookie class? The fan class?
6. For most of this project, you will treat each image as a vector rather than as a matrix. As you saw above, your images are currently stored in an array of shape (num images, height, width). Reshape the inputs to have shape num images, height * width.
7. Print the new shape of images.
8. Plot the ﬁrst 5 ﬂattened vectors (corresponding to the ﬁrst 5 images) using plt.plot. (This should result in 5 diﬀerent lines on the plot. This is not intended to give you much information, but is included simply to emphasize the fact that your classiﬁers here will be taking in ﬂattened images as input.)
1.4 Deliverables
In this assignment, all the coding is done in multiple Jupyter notebooks. After ﬁnishing sections 1.5−1.8, convert each completed notebook into a PDF, and combine the PDFs into a single PDF to submit on gradescope.com. We will also be providing a template for you to answer the speciﬁc questions that will be graded. The questions will start with *** in these instruction. For each, please answer the question in the Latex template. You may be asked to include a Figure in your answer.
1.5 Implementing a Simple Two-Layer NN (20 points)
You will implement a simple model to classify images from the QuickDraw dataset as either apple (0), baseball (1), cookie (2), clock (3), or fan (4). You will use the Sequential module to map each input vector of length 26×26 to hidden layers, and essentially to a vector of length 5, which contains class scores or logits. You can think of these as real-valued scores for each class, and it is these scores that should be the output of your models. Later, these scores will be passed directly to a loss function. For example, they can be exponentiated (making them nonnegative) and normalized (making them sum to 1) using the softmax function, and then used with cross-entropy loss. In this section, you will implicitly rely on the softmax function through PyTorch’s cross-entropy loss function, however you never need to use the softmax function directly. Here we
Fall 2018 CS 475 Machine Learning: Homework 3 4
describe it so you know what’s going on: the i-th element of the softmax function applied to a vector z is [softmax(z)]i = exp(zi) Pi exp(zi) Note that applying the softmax function forms a valid probability distribution over our classes. In the binary-classiﬁcation case, you would compute a single score for each example, apply the sigmoid function to obtain a distribution, and train by minimizing binary cross-entropy loss (i.e., log loss). For this multi-class problem, you will instead compute 5 scores for each example, apply the softmax function to obtain a distribution, and train by minimizing cross-entropy loss. Again, you will not explicitly compute the softmax, as in PyTorch the softmax operation and cross entropy loss are combined inside the torch.nn package. (This is common in other frameworks, too, because it a) is convenient, b) allows for better numerical stability, and c) is faster.)
1. Create a new notebook, name it simple-two-layer-nn, and save it.
2. *** What accuracy would a random classiﬁer (which assigns labels randomly) achieve on this task (approximately, answer in box a)? What accuracy would a majorityvote classiﬁer achieve on this task (approximately, answer in box b)? Round your answer to the closest whole integer. 3. Add imports as necessary. Our recommendation is
%matplotlib inline import matplotlib.pyplot as plt import numpy as np
import torch from torch import autograd import torch.nn.functional as F
4. Prepare your data for classiﬁcation. You may need to be careful with types (unsigned ints vs. ints vs ﬂoats), and you may want to normalize your data (for example, you could normalize your data so that each individual image has a mean of 0.0 and a variance of 1.0).
5. Prepare constants as necessary. This should include things like HEIGHT, the height of each image, WIDTH, the width of each image, NUM CLASSES, the number of classes, and D H, the dimension of the hidden layer, and NUM OPT STEPS, the number of optimization steps to take during training. Add constant as you see ﬁt.
6. Create a new class named TwoLayerNN, which a) uses one linear layer to map from the inputs to 100 hidden units, with ReLU activations, and b) uses another linear layer to map from the hidden units to vectors of length 5. One way to implement ReLU is using clamp in torch as shown in the example code below. You can also use torch.nn.functional.relu, along with your two linear layers, during the forward pass.
class TwoLayerNet(torch.nn.Module): def __init__(self, HEIGHT*WIDTH, D_H, NUM_CLASSES): """ In the constructor we instantiate two nn.Linear modules and assign them as member variables. """ super(TwoLayerNet, self).__init__()
Fall 2018 CS 475 Machine Learning: Homework 3 5
self.linear1 = torch.nn.Linear(HEIGHT*WIDTH, D_H) self.linear2 = torch.nn.Linear(D_H, NUM_CLASSES)
def forward(self, x): """ In the forward function we accept a Tensor of input data and we must return a Tensor of output data. We can use Modules defined in the constructor as well as arbitrary operators on Tensors. """ h_relu = self.linear1(x).clamp(min=0) y_pred = self.linear2(h_relu) return y_pred
Note that this is a simple wrapper around the Linear Module. We include it here for your convenience, and we strongly recommend that each of your models is deﬁned using this as a template.
7. *** Behind the scenes, the torch.nn.Linear module is creating parameters for you (and initializing those parameters to reasonable values). In this particular case, how many weights (answer in box a) and how many biases (answer in box b) are being created? 8. Create your model using the following code:
model = TwoLayerNet()
9. Create an SGD optimizer using the following code:
optimizer = torch.optim.SGD(model.parameters(), lr=1e-6)
10. Deﬁne a function train(batch size) that is responsible for taking a single optimization step using batch size randomly-chosen examples. You can do this either by writing it yourself or by reusing code from the following example function:
def train(batch_size): # i is is a 1-D array with shape [batch_size] i = np.random.choice(train_seqs.shape[0], size=batch_size, replace=False) x = torch.from_numpy(train_seqs[i].astype(np.float32)) y = torch.from_numpy(train_labels[i].astype(np.int)) # Forward pass: Compute predicted y by passing x to the model y_hat_ = model(x) # Compute and print loss loss = F.cross_entropy(y_hat_, y) # Zero gradients, perform a backward pass, and update the weights. optimizer.zero_grad() loss.backward() optimizer.step() return loss.item()
You may want to look up the documentation for cross entropy or CrossEntropyLoss to determine what are the two arguments that are expected.
11. Deﬁne a function accuracy(y, y hat). You can use the following function if you’d like to, but feel free to compute accuracy direction in PyTorch if you’d prefer.
def accuracy(y, y_hat): """Compute accuracy.
Fall 2018 CS 475 Machine Learning: Homework 3 6
Args: y: A 1-D int NumPy array. y_hat: A 1-D int NumPy array.
Returns: A float, the fraction of time y[i] == y_hat[i]. """ return (y == y_hat).astype(np.float).mean()
12. Deﬁne a function approx train accuracy() that extracts 1,000 random training instances, creates a single batch with all of these inputs, computes integer predictions for each example in the batch, and returns an accuracy by comparing these predictions to the ground-truth labels. Note that here an integer prediction can be obtained by taking the argmax over the class scores for each example. (You may want to look at the documentation for np.argmax or torch.max; either can be used to obtain the desired argmax over all examples.)
13. Deﬁne a function val accuracy() that creates a single batch with all validation examples, computes integer predictions for each example in the batch, and returns an accuracy by comparing these predictions to the ground-truth labels.
14. Side note: Our structure for approx train accuracy() and val accuracy() above is not mandatory; if you would rather deﬁne these functions in diﬀerent ways, or even collapse them into a single function, feel free to. 15. Finally, train your model and keep track of training and validation accuracies as a function of optimization step with the following code.
train_accs, val_accs = [], [] for i in range(NUM_OPT_STEPS): train(batch_size) if i % 100 == 0: train_accs.append(approx_train_accuracy()) val_accs.append(val_accuracy()) print("%6d %5.2f %5.2f" % (i, train_accs[-1], val_accs[-1]))
16. Train this network for 5,000 steps using a batch size of 1, using Adam as the optimizer with a learning rate of 0.001.
17. Review the plot of training accuracy and validation accuracy as a function of optimization step.
18. Reset the parameters of your model using reset parameters. Retrain the network for 5,000 steps using a batch size of 10, using Adam as the optimizer with a learning rate of 0.001.
19. Review the plot of training accuracy and validation accuracy as a function of optimization step.
20. *** What was the best validation accuracy obtained with a batch size of 10 (answer in box a, round to nearest whole integer)? Approximately how long did training take (answer in box b, in seconds, rounded to the nearest whole integer) ? Notice that we are visiting 10x as many examples during this training run. Did training take 10x longer than in the case of pure SGD (answer in box c)?
Fall 2018 CS 475 Machine Learning: Homework 3 7
21. Experiment with the batch size, the learning rate, and the number of steps to try to maximize validation accuracy.
22. *** Include a plot of training accuracy and validation accuracy as a function of optimization step for your best settings (replace blank.png with your ﬁle name in the box).
1.6 Implementing a Simple Convolutional NN (20 points)
Here you’ll create a simple convolutional neural network (in fact, too simple) for multiclass classiﬁcation, again using cross-entropy loss. As a reminder, 2-D convolutional neural networks operate on images rather than vectors. Their key property is that hidden units are formed using local spatial regions of the input image, with weights that are shared over spatial regions. To see an animation of this behavior, see https://github.com/ vdumoulin/conv_arithmetic. In particular, pay attention to the 1st animation (no padding, no strides) and to the 9th animation (no padding, with strides). Here, one 3x3 convolutional ﬁlter is swept over the image, with the image shown in blue and the output map (which is just a collection of the hidden units) shown in green. You can think of each convolutional ﬁlter as a small, translation-invariant pattern detector.
1. Copy the content from simple-two-layer-nn to a new notebook named simple-conv-nn. Feel free to remove any previous code that does not help with this section.
2. *** Above we mentioned that convolutional ﬁlters are applied to local image regions, with weights shared across regions. How does this compare to fully-connected neural networks? 3. Create a new class named TooSimpleConvNN using the following code. In short, this model a) reshapes our vectors back into images; b) applies two convolutional layers; c) averages each channel spatially, so that each ‘image’ ends up with a height and width of 1; and ﬁnally d) applies a ﬁnal convolution to yield 5 channels. In the end, any particular image has 5 channels with 1 pixel, and these 5 channels correspond to class scores. These are then reshaped back into a Tensor with shape [batch size, NUM CLASSES].
class TooSimpleConvNN(torch.nn.Module): def __init__(self): super().__init__() # 3x3 convolution that takes in an image with one channel # and outputs an image with 8 channels. self.conv1 = torch.nn.Conv2d(1, 8, kernel_size=3) # 3x3 convolution that takes in an image with 8 channels # and outputs an image with 16 channels. The output image # has approximately half the height and half the width # because of the stride of 2. self.conv2 = torch.nn.Conv2d(8, 16, kernel_size=3, stride=2) # 1x1 convolution that takes in an image with 16 channels and # produces an image with 5 channels. Here, the 5 channels # will correspond to class scores. self.final_conv = torch.nn.Conv2d(16, 5, kernel_size=1) def forward(self, x): # Convolutions work with images of shape # [batch_size, num_channels, height, width] x = x.view(-1, HEIGHT, WIDTH).unsqueeze(1) x = F.relu(self.conv1(x)) x = F.relu(self.conv2(x))
Fall 2018 CS 475 Machine Learning: Homework 3 8
n, c, h, w = x.size() x = F.avg_pool2d(x, kernel_size=[h, w]) x = self.final_conv(x).view(-1, NUM_CLASSES) return x
4. Reusing much of your code from the previous notebook, train this network for 2,000 steps using a batch size of 10, using Adam as the optimizer with a learning rate of 0.001.
5. Plot training accuracy and validation accuracy as a function of optimization step.
6. *** What was the best validation accuracy obtained with a batch size of 10, run for 2,000 steps (answer in box a)? Approximately how many seconds did training take (answer in box b)? (We are not asking for precision with respect to timing; a ballpark estimate is ﬁne).
7. Experiment with the batch size, the learning rate, and the number of steps to try to maximize validation accuracy.
8. Plot training accuracy and validation accuracy as a function of optimization step for your best settings.
9. *** What was the best validation accuracy achieved (answer in box a, rounded to nearest whole integer)? What was the corresponding batch size (answer in box b) and learning rate (answer in box c)? How many optimization steps did you need to take to reach that accuracy (answer in box d)? How long did training take (answer in box e, rounded to nearest whole integer)?
1.7 Maximizing Performance (30 points)
Here your goal is to maximize performance. This section focuses on maximizing validation accuracy, but the end goal will be to maximize accuracy on the test set provided on Kaggle.
1.7.1 Kaggle Details
You can ﬁnd the Kaggle page here: https://www.kaggle.com/t/7f2d205604c648179563d37355b05cd6 Ignore the deadline listed on the Kaggle page. Kaggle is a platform that allows you to submit predictions on a given dataset, and then have those predictions evaluated on the (not shared) labels of that dataset. The results are shown on a leaderboard, where you can compare your submission to others. We only require that you submit your predictions to Kaggle. This assignment will award points based on your accuracy on Kaggle. We have provided you with test data test images.npy but no labels to accompany these images. See the Kaggle page for the data. You will train your neural network, tune it to get the best performance you can, and then run the trained network on the data in test images.npy. For each of the images, you will make a prediction (multi-class). You will submit these predictions to Kaggle. Create a comma separate ﬁle with 2 columns. The ﬁrst row of the ﬁle should have the names of the columns: “id,label”. The id column is an integer id (0-based) for each of the images in “test images.npy”, i.e. the ﬁrst entry in test data has id 0, the second entry has id 1... up to 4999. The value for label will be your prediction: an integer that is 0, 1, 2, 3 or 4.
Fall 2018 CS 475 Machine Learning: Homework 3 9
You will upload this ﬁle as your submission to Kaggle. If Name your submission with the same username that you use on the class website. Either way, be sure to include the name of your Kaggle submission in your PDF submission for the project by adding a ﬁnal markdown cell to the end of your notebook and writing your username. If we cannot ﬁnd your Kaggle submission, you will not receive the points. You may submit multiple output ﬁles to Kaggle if you’d like to compare several different trained networks. We will only count your best submission.
1.7.2 Kaggle competition
1. Copy the previous notebook of your choice to a new notebook named maximizing-performance. Clear all outputs and delete all existing Markdown cells. Feel free to remove any previous code that does not help with this section.
2. Do whatever you’d like to your network conﬁgurations to maximize validation accuracy, and test your best models by uploading their predictions to Kaggle and obtaining accuracy on the oﬃcial test set.
3. You can do nearly anything you’d like to here: Feel free to vary the optimizer, mini-batch sizes, the number of layers and/or the number of hidden units / number of ﬁlters per layer; include dropout if you’d like do; etc. You can even go the extra mile with techniques such as data augmentation, where input images may be randomly cropped and/or translate and/or rotated etc., or use ensembles of networks to improve generalization performance further.
4. However, there is one limitation: You may not add additional training data (which you could obtain through Google’s QuickDraw project).
5. Be sure to keep notes as you explore diﬀerent options, as you will later need to explain the experimental trajectory that led to your top performance. It is this explanation, along with plots, that will determine how many points you get in this section.
6. Include plots of training accuracy and validation accuracy as a function of optimization step for your typical conﬁguration and two of your best conﬁgurations.
7. Hints. You may want to focus on convolutional networks, since they are especially well suited for processing images. You may often ﬁnd yourself in a situation where training is just too slow (for example where validation accuracy fails to climb after a 5 minute period). It is up to you to cut experiments oﬀ early: if training in a reasonable amount of time isn’t viable, then you can try to change your network or hyperparameters to help speed things up. In addition, earlier we repeatedly asked that you vary other parameters as necessary to maximize performance. There is obviously an enormous number of possible conﬁgurations, involving optimizers, learning rates, mini-batch sizes, etc. Our advice is to ﬁnd settings that consistently work well across simple architectures, and to only adjust these settings if insights based on training and validation curves suggest that you should do so. In particular, remember that Adam helps you avoid manual learning-rate tuning, and remember that very small minibatches and very large minibatches will both lead to slow performance, so striking a balance is important.
Fall 2018 CS 475 Machine Learning: Homework 3 10
8. *** Write a paragraph with bullets that (minimally) address the following questions: Explain your starting point. What optimizer and learning rate did you settle on? What mini-batch size did you settle on? Explain what adjustments you tried to make. Was training too slow for any particular conﬁgurations? How did you circumvent this problem? What were the most important changes for achieving high accuracy? Describe what your ﬁnal model is doing in plain English. This explanation will be a major part of your points for this section. A bulleted paragraph is ﬁne, but make sure you are speciﬁc in your descriptions.
1.7.3 Performance Awards
As mentioned earlier, points will be allocated according to your top model’s performance on the held-out test set available on Kaggle. Details of the Kaggle competition will be available on Piazza. The following accuracy thresholds will be used for point allocation: Accuracy Points
Above 80% 5 Above 85% 10 Above 90% 15 Above 95% 25 Above 97% 30 Additionally, we will award each of the top 3 scores 5 points of extra credit. These scores will be based on the private leaderboard. Kaggle shows you your score on a subset of the data, but holds out showing you accuracy on the private data. Since you can submit multiple runs on the test data and observe the results, the private data is the true held out data.
1.8 Exploring Failure Modes (10 points)
Here you’ll explore the failure modes of your best model.
1. You can continue to extend the maximizing-performance notebook for this analysis.
2. Locate some success cases and some failure cases in the validation set. (In other words, ﬁnd some images that were correctly classiﬁed by your best model, and some that were misclassiﬁed by your best model).
3. Visualize 10 correctly-classiﬁed images and 10 incorrectly-classiﬁed images using plt.imshow.
4. *** Are there any qualitative diﬀerences between these sets of images (answer in box a)? Are the misclassiﬁed examples more diﬃcult for you to classify as a human (answer in box b)?
5. Create a copy of the 10 correctly-classiﬁed images and add Gaussian noise to them, with a standard deviation that is approximately one tenth of the image’s range. (For example, if the pixel values of the images range from -10 to 10, the range is 20, so you would use a standard deviation of 2.0.)
Fall 2018 CS 475 Machine Learning: Homework 3 11
6. *** Visualize these perturbed images and classify them with your model. Select one of these images and place in in the answerbox, replacing Blank.png with the image’s ﬁlename.
7. *** Does the classiﬁer still classify all 10 images correctly?
8. Create a copy of the 10 correctly-classiﬁed images and ﬂip them horizontally.
9. Visualize these ﬂipped images and classify them with your model.
10. *** Does the classiﬁer still classify all 10 images correctly?
11. *** Let’s assume that your model “failed” to classify the ﬂipped images correctly. First of all, is this “failure” necessarily a failure? In other words, do scenarios exist in which you do not want to remain invariant to horizontal ﬂipping? Now, suppose that in this application you do want to remain invariant to horizontal ﬂipping. How could you change your training process so that the model remains robust to such transformations?
2 Analytical (20 points)
1) Dropout (7 points) We’ve seen how dropout can help us improve generalization in neural networks by randomly selecting nodes to zero out in training. Suppose we tried to do something similar in Pegasos, the algorithm we used to train our SVMs in the last assignment. Recall that the update rule for Pegasos is
wt+1 ← (1−
1 t
)wt +
1 λt
1[yithwt,xiti < 1]yitxit (1)
Let’s say we perform something similar to dropout here: with 25% probability, we zero-out a given feature (i.e. element of the vector in the right-hand side of addition above) during training. What would be the advantages and disadvantages of this? (Hint: Try it on your implementation of Pegasos!)
2) Neural Networks (6 points) Suppose you are building a model to classify whether a dog is a certain breed. You have plentiful data with many features you could use: various physical measurements, visual attributes, age, location, data from behavioral assessments, etc. However, you have no idea which features are important to classifying this breed.
1. You train a multilayer perceptron with 1 hidden layer. However, you ﬁnd that your training converges with still relatively low accuracy. Why might you beneﬁt from a deeper network?
2. With this deeper network, your accuracy on the training data dramatically improves, but when you evaluate on your dev data, performance is substantially lower. Why would this likely occur in a deeper network?
3. Propose a solution (i.e. name a technique) to ﬁx this problem.
Fall 2018 CS 475 Machine Learning: Homework 3 12
3) Backpropagation (7 points) Suppose you have a network with the shown weights and network structure, with sigmoid activation functions used for both hidden and output layers. You are training your network with cross-entropy loss.
1. If we see the positive training example (1,1), would the current network weights allow us to classify this data point correctly? Will we still incur loss in training? If so, give the loss incurred to 2 decimal places. If not, justify your answer.
2. For a positive example, write the loss function in terms of x1 and x2. For consistency, please use a σ to refer to the sigmoid functions.
3. Show how backpropagation would be done, writing the partial derivative updates, with b representing the linear part of the output layer and a1 and a2 the linear parts of the hidden layer. As we did in class, you can write the updates in terms of other updates.
3 What to Submit
In this assignment you will submit two things.
1. Submit your writeup to gradescope.com. Your writeup must be compiled from latex and uploaded as a PDF. The writeup should contain all of the answers to the analytical questions asked in the assignment as well as the answers to the questions marked with *** in the programming. Make sure to include your name in the writeup PDF and to use the provided latex template for your answers following the distributed template. You will submit this to the assignment called “Homework 3: Deep Learning: Template”.
2. Submit your Jupyter notebook to gradescope.com. Create a PDF ﬁle containing all of your notebooks (e.g. File → Download As → PDF via LaTeX (.pdf), then combine into one document). Be sure you have run the entire notebook so we can see all of your output. You will submit this to the assignment called “Homework 3: Deep Learning: Notebook”.

Shopping cart

US$0

Homework 3 Deep Learning

More products