Starting from:

$30

Lab 1. PyTorch and ANNs


Total: 30 Points
Late Penalty: There is a penalty-free grace period of one hour past the deadline. Any work that is
submitted between 1 hour and 24 hours past the deadline will receive a 20% grade deduction. No
other late work is accepted. Quercus submission time will be used, not your local computer time.
You can submit your labs as many times as you want before the deadline, so please submit often
and early.
Grading TA: Justin Beland, Ali Khodadadi
This lab is based on assignments developed by Jonathan Rose, Harris Chan, Lisa Zhang, and Sinisa
Colic.
This lab is a warm up to get you used to the PyTorch programming environment used in the course,
and also to help you review and renew your knowledge of Python and relevant Python libraries. The
lab must be done individually. Please recall that the University of Toronto plagarism rules apply.
By the end of this lab, you should be able to:
1. Be able to perform basic PyTorch tensor operations.
2. Be able to load data into PyTorch
3. Be able to congure an Articial Neural Network (ANN) using PyTorch
4. Be able to train ANNs using PyTorch
5. Be able to evaluate different ANN conguations
You will need to use numpy and PyTorch documentations for this assignment:
https://docs.scipy.org/doc/numpy/reference/
https://pytorch.org/docs/stable/torch.html
You can also reference Python API documentations freely.
What to submit
Submit a PDF le containing all your code, outputs, and write-up from parts 1-5. You can produce a
PDF of your Google Colab le by going to File -> Print and then save as PDF. The Colab
instructions has more information.
Do not submit any other les produced by your code.
Include a link to your colab le in your submission.
Lab 1. PyTorch and ANNs
Please use Google Colab to complete this assignment. If you want to use Jupyter Notebook, please
complete the assignment and upload your Jupyter Notebook le to Google Colab for submission.
Adjust the scaling to ensure that the text is not cutoff at the margins.
Submit make sure to include a link to your colab le here
Colab Link: https://colab.research.google.com/drive/12bdDVWttgezyCyJ9R22MyfI65zAM8osH?
usp=sharing
Colab Link
The purpose of this section is to get you used to the basics of Python, including working with
functions, numbers, lists, and strings.
Note that we will be checking your code for clarity and eciency.
If you have trouble with this part of the assignment, please review http://cs231n.github.io/pythonnumpy-tutorial/
Part 1. Python Basics [3 pt]
Write a function sum_of_cubes that computes the sum of cubes up to n . If the input to
sum_of_cubes invalid (e.g. negative or non-integer n ), the function should print out "Invalid
input" and return -1 .
Part (a) -- 1pt
def sum_of_cubes(n):
    """Return the sum (1^3 + 2^3 + 3^3 + ... + n^3)  
    Precondition: n > 0, type(n) == int
    >>> sum_of_cubes(3)
    36
    >>> sum_of_cubes(1)
    1
    """
    if n>0 and type(n)==int:
      sum=0
      while(n>0):
        sum=sum+n**3
        n=n-1
      return(sum)
    print("Invalid input")
    return(-1)
print(sum_of_cubes(3))
print(sum_of_cubes(1))
d
36
1
Write a function word_lengths that takes a sentence (string), computes the length of each word in
that sentence, and returns the length of each word in a list. You can assume that words are always
separated by a space character " " .
Hint: recall the str.split function in Python. If you arenot sure how this function works, try typing
help(str.split) into a Python shell, or check out
https://docs.python.org/3.6/library/stdtypes.html#str.split
Part (b) -- 1pt
help(str.split)
def word_lengths(sentence):
    """Return a list containing the length of each word in
    sentence.
    
    >>> word_lengths("welcome to APS360!")
    [7, 2, 7]
    >>> word_lengths("machine learning is so cool")
    [7, 8, 2, 2, 4]
    """
    lengthList=[]
    array=sentence.split()
    for word in array:
      count=len(word)
      lengthList.append(count)
    return (lengthList)
print(word_lengths("welcome to APS360!"))
print(word_lengths("machine learning is so cool"))
d
[7, 2, 7]
[7, 8, 2, 2, 4]
Write a function all_same_length that takes a sentence (string), and checks whether every word
in the string is the same length. You should call the function word_lengths in the body of this new
function.
Part (c) -- 1pt
def all_same_length(sentence):
    """Return True if every word in sentence has the same
    length, and False otherwise.
    
    >>> all_same_length("all same length")
    False
    >>> word_lengths("hello world")
    True
    """
    array=word_lengths(sentence)
    length=array[0]
    for i in range(1,len(array)):
      if array[i]!=length:
        return ("False")
    return("True")
print(all_same_length("all same length"))
d
False
In this part of the assignment, you'll be manipulating arrays usign NumPy. Normally, we use the
shorter name np to represent the package numpy .
Part 2. NumPy Exercises [5 pt]
pimport numpy as np
The below variables matrix and vector are numpy arrays. Explain what you think
<NumpyArray>.size and <NumpyArray>.shape represent.
Part (a) -- 1pt
Answer:
NumpyArray.size represents the number of element in the NumpyArray,
NumpyArray.shape represents the number of elements in each dimension of the NumpyArray
matrix = np.array([[1., 2., 3., 0.5],
                   [4., 5., 0., 0.],
                   [-1., -2., 1., 1.]])
vector = np.array([2., 0., 1., -2.])
m
ematrix.size
12
ematrix.shape
(3, 4)
evector.size
4
evector.shape
(4,)
Perform matrix multiplication output = matrix x vector by using for loops to iterate through
the columns and rows. Do not use any builtin NumPy functions. Cast your output into a NumPy
array, if it isn't one already.
Hint: be mindful of the dimension of output
Part (b) -- 1pt
output = None
if len(matrix[0])!=len(vector):
  output="invalid matrix multiplication"
else:
  array = []
  for row in range(len(matrix)):
    sum=0
    for column in range(len(matrix[0])):
      sum=sum+matrix[row][column]*vector[column]
    array.append(sum)
    output = np.array(array)
o
'
invalid matrix multiplication
'
toutput
Perform matrix multiplication output2 = matrix x vector by using the function numpy.dot .
We will never actually write code as in part(c), not only because numpy.dot is more concise and
easier to read/write, but also performance-wise numpy.dot is much faster (it is written in C and
highly optimized). In general, we will avoid for loops in our code.
Part (c) -- 1pt
output2 = None
if matrix.shape[1]!=vector.shape[0]:
  output2="invalid matrix multiplication"
else:
  output2=np.dot(matrix, vector)
o
'
invalid matrix multiplication
'
output2
As a way to test for consistency, show that the two outputs match.
Part (d) -- 1pt
if np.array_equal(output, output2)==True:
  print("Two outputs match.")
else:
  print("Two outputs do not match.")
=
Two outputs match.
Show that using np.dot is faster than using your code from part (c).
You may nd the below code snippit helpful:
Part (e) -- 1pt
import time
# record the time before running code
start_time = time.time()
# place code to run here
for i in range(10000):
    99*99
    
# record the time after the code is run
end_time = time.time()
# compute the difference
diff = end_time - start_time
diff
0.000843048095703125
import time
#record time for part b
start_time1=time.time()
output = None
if len(matrix[0])!=len(vector):
  output="invalid matrix multiplication"
else:
  array = []
  for row in range(len(matrix)):
    sum=0
    for column in range(len(matrix[0])):
      sum=sum+matrix[row][column]*vector[column]
    array.append(sum)
    output = np.array(array)
end_time1 = time.time()
diff1 = end_time1 - start_time1
#record time for part c
start_time2=time.time()
output2 = None
if matrix.shape[1]!=vector.shape[0]:
  output2="invalid matrix multiplication"
else:
  output2=np.dot(matrix, vector)
end_time2 = time.time()
diff2 = end_time2 - start_time2
#compare two diffs
if diff1 < diff2:
  print("np.dot is slower")
elif diff1 > diff2:
  print("np.dot is faster")
else:
  print("same")
np.dot is faster
A picture or image can be represented as a NumPy array of “pixels”, with dimensions H × W × C,
where H is the height of the image, W is the width of the image, and C is the number of colour
channels. Typically we will use an image with channels that give the the Red, Green, and Blue “level”
of each pixel, which is referred to with the short form RGB.
You will write Python code to load an image, and perform several array manipulations to the image
and visualize their effects.
Part 3. Images [6 pt]
import matplotlib.pyplot as plt
This is a photograph of a dog whose name is Mochi.
Load the image from its url (https://drive.google.com/uc?
export=view&id=1oaLVR2hr1_qzpKQ47i9rVUIklwbDcews) into the variable img using the
plt.imread function.
Hint: You can enter the URL directly into the plt.imread function as a Python string.
Part (a) -- 1 pt
img = plt.imread(
"https://drive.google.com/uc?export=view&id=1oaLVR2hr1_qzpKQ47i9rVUIklwbDcews")
Use the function plt.imshow to visualize img .
This function will also show the coordinate system used to identify pixels. The origin is at the top
left corner, and the rst dimension indicates the Y (row) direction, and the second dimension
indicates the X (column) dimension.
Part (b) -- 1pt
<matplotlib.image.AxesImage at 0x7f1b8cd36908>
plt.imshow(img)
Modify the image by adding a constant value of 0.25 to each pixel in the img and store the result in
the variable img_add . Note that, since the range for the pixels needs to be between [0, 1], you will
also need to clip img_add to be in the range [0, 1] using numpy.clip . Clipping sets any value that is
outside of the desired range to the closest endpoint. Display the image using plt.imshow .
Part (c) -- 2pt
<matplotlib.image.AxesImage at 0x7f1b8a911940>
img_add = None
img_add = np.clip(img+0.25, 0, 1)
plt.imshow(img_add)
Crop the original image ( img variable) to a 130 x 150 image including Mochi's face. Discard the
alpha colour channel (i.e. resulting img_cropped should only have RGB channels)
Display the image.
Part (d) -- 2pt
<matplotlib.image.AxesImage at 0x7f1b8a5a2198>
img_cropped=img[10:160,20:150,:3] 
plt.imshow(img_cropped)
PyTorch is a Python-based neural networks package. Along with tensorow, PyTorch is currently
one of the most popular machine learning libraries.
PyTorch, at its core, is similar to Numpy in a sense that they both try to make it easier to write codes
for scientic computing achieve improved performance over vanilla Python by leveraging highly
optimized C back-end. However, compare to Numpy, PyTorch offers much better GPU support and
provides many high-level features for machine learning. Technically, Numpy can be used to perform
almost every thing PyTorch does. However, Numpy would be a lot slower than PyTorch, especially
with CUDA GPU, and it would take more effort to write machine learning related code compared to
using PyTorch.
Part 4. Basics of PyTorch [6 pt]
import torch
Use the function torch.from_numpy to convert the numpy array img_cropped into a PyTorch
tensor. Save the result in a variable called img_torch .
Part (a) -- 1 pt
img_torch = None
img_torch=torch.from_numpy(img_cropped)
Use the method <Tensor>.shape to nd the shape (dimension and size) of img_torch .
Part (b) -- 1pt
eimg_torch.shape
torch.Size([150, 130, 3])
How many oating-point numbers are stored in the tensor img_torch ?
Part (c) -- 1pt
num_of_fp=img_torch.shape[0] * img_torch.shape[1] * img_torch.shape[2]
num_of_fp
p
58500
What does the code img_torch.transpose(0,2) do? What does the expression return? Is the
original variable img_torch updated? Explain.
Part (d) -- 1 pt
Answer:
The code img_torch.transpose(0,2) swaps the elements of img_torch in the dimension 0 with
dimension 2.
The expression return the transpose version of img_torch with 3x130x150 dimensions.
The original variable is not updated, since it still have 150x130x3 dimensions and not be
overwritten.
print(img_torch.transpose(0,2))
print(img_torch.shape)
print(img_torch.transpose(0,2).shape)
tensor([[[0.5647, 0.5843, 0.5647, ..., 0.6000, 0.5922, 0.5843],
 [0.5412, 0.5647, 0.5529, ..., 0.5843, 0.5922, 0.5961],
 [0.5529, 0.5882, 0.5843, ..., 0.5843, 0.5922, 0.6039],
 ...,
 [0.6000, 0.5961, 0.5961, ..., 0.6275, 0.6314, 0.6314],
 [0.6275, 0.6314, 0.6235, ..., 0.6039, 0.6157, 0.6157],
 [0.6275, 0.6353, 0.6196, ..., 0.5922, 0.6039, 0.6078]],
 [[0.3451, 0.3647, 0.3647, ..., 0.4039, 0.3961, 0.3882],
 [0.3216, 0.3451, 0.3529, ..., 0.4000, 0.3961, 0.4000],
 [0.3373, 0.3725, 0.3882, ..., 0.4000, 0.4039, 0.4157],
 ...,
 [0.3647, 0.3608, 0.3647, ..., 0.5059, 0.5098, 0.5098],
 [0.3922, 0.3961, 0.3922, ..., 0.4902, 0.4941, 0.4941],
 [0.3922, 0.4000, 0.4000, ..., 0.4784, 0.4902, 0.4863]],
 [[0.1137, 0.1333, 0.1490, ..., 0.2039, 0.1961, 0.1882],
 [0.0824, 0.1059, 0.1294, ..., 0.1961, 0.1961, 0.2078],
 [0.0824, 0.1176, 0.1569, ..., 0.1961, 0.2118, 0.2235],
 ...,
 [0.1294, 0.1255, 0.1451, ..., 0.3843, 0.3882, 0.3882],
 [0.1569, 0.1608, 0.1725, ..., 0.3647, 0.3725, 0.3725],
 [0.1490, 0.1569, 0.1765, ..., 0.3529, 0.3647, 0.3647]]])
torch.Size([150, 130, 3])
torch.Size([3, 130, 150])
What does the code img_torch.unsqueeze(0) do? What does the expression return? Is the
original variable img_torch updated? Explain.
Part (e) -- 1 pt
Answer:
The code img_torch.unsqueeze(0) adds an additional dimension to the tensor at the 0 position
The expression returns a new tensor with a dimension of size one inserted at the specied position.
The original variable is not updated, since it still have 150x130x3 dimensions and not be
overwritten.
print(img_torch.unsqueeze(0))
print(img_torch.shape)
print(img_torch.unsqueeze(0).shape)
p
tensor([[[[0.5647, 0.3451, 0.1137],
 [0.5412, 0.3216, 0.0824],
 [0.5529, 0.3373, 0.0824],
 ...,
 [0.6000, 0.3647, 0.1294],
 [0.6275, 0.3922, 0.1569],
 [0.6275, 0.3922, 0.1490]],
 [[0.5843, 0.3647, 0.1333],
 [0.5647, 0.3451, 0.1059],
 [0.5882, 0.3725, 0.1176],
 ...,
 [0.5961, 0.3608, 0.1255],
 [0.6314, 0.3961, 0.1608],
 [0.6353, 0.4000, 0.1569]],
 [[0.5647, 0.3647, 0.1490],
 [0.5529, 0.3529, 0.1294],
 [0.5843, 0.3882, 0.1569],
 ...,
 [0.5961, 0.3647, 0.1451],
 [0.6235, 0.3922, 0.1725],
 [0.6196, 0.4000, 0.1765]],
 ...,
 [[0.6000, 0.4039, 0.2039],
 [0.5843, 0.4000, 0.1961],
 [0.5843, 0.4000, 0.1961],
 ...,
 [0.6275, 0.5059, 0.3843],
 [0.6039, 0.4902, 0.3647],
 [0.5922, 0.4784, 0.3529]],
 [[0.5922, 0.3961, 0.1961],
 [0.5922, 0.3961, 0.1961],
 [0.5922, 0.4039, 0.2118],
 ...,
 [0.6314, 0.5098, 0.3882],
 [0.6157, 0.4941, 0.3725],
 [0.6039, 0.4902, 0.3647]],
 [[0.5843, 0.3882, 0.1882],
 [0.5961, 0.4000, 0.2078],
 [0.6039, 0.4157, 0.2235],
 ...,
 [0.6314, 0.5098, 0.3882],
 [0.6157, 0.4941, 0.3725],
 [0.6078, 0.4863, 0.3647]]]])
torch.Size([150, 130, 3])
torch.Size([1, 150, 130, 3])
Find the maximum value of img_torch along each colour channel? Your output should be a onedimensional PyTorch tensor with exactly three values.
Hint: lookup the function torch.max .
Part (f) -- 1 pt
torch.tensor([torch.max(img_torch[:,:,0]),torch.max(img_torch[:,:,1]),
              torch.max(img_torch[:,:,2])])
t
tensor([0.8941, 0.7882, 0.6745])
The sample code provided below is a 2-layer ANN trained on the MNIST dataset to identify digits
less than 3 or greater than and equal to 3. Modify the code by changing any of the following and
observe how the accuracy and error are affected:
number of training iterations
number of hidden units
numbers of layers
types of activation functions
learning rate
Part 5. Training an ANN [10 pt]
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
import matplotlib.pyplot as plt # for plotting
import torch.optim as optim
torch.manual_seed(1) # set the random seed
# define a 2-layer artificial neural network
class Pigeon(nn.Module):
    def __init__(self):
        super(Pigeon, self).__init__()
        self.layer1 = nn.Linear(28 * 28, 30)
        self.layer2 = nn.Linear(30, 1)
    def forward(self, img):
        flattened = img.view(-1, 28 * 28)
        activation1 = self.layer1(flattened)
        activation1 = F.relu(activation1)
        activation2 = self.layer2(activation1)
        return activation2
pigeon = Pigeon()
# load the data
mnist_data = datasets.MNIST('data', train=True, download=True)
mnist_data = list(mnist_data)
mnist_train = mnist_data[:1000]
mnist_val   = mnist_data[1000:2000]
img_to_tensor = transforms.ToTensor()
      
    
# simplified training code to train `pigeon` on the "small digit recognition" task
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(pigeon.parameters(), lr=0.005, momentum=0.9)
for (image, label) in mnist_train:
    # actual ground truth: is the digit less than 3?
    actual = torch.tensor(label < 3).reshape([1,1]).type(torch.FloatTensor)
    # pigeon prediction
    out = pigeon(img_to_tensor(image)) # step 1-2
    # update the parameters based on the loss
    loss = criterion(out, actual)      # step 3
    loss.backward()                    # step 4 (compute the updates for each paramete
    optimizer.step()                   # step 4 (make the updates for each parameter)
    optimizer.zero_grad()              # a clean up step for PyTorch
# computing the error and accuracy on the training set
error = 0
for (image, label) in mnist_train:
    prob = torch.sigmoid(pigeon(img_to_tensor(image)))
    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):
        error += 1
print("Training Error Rate:", error/len(mnist_train))
print("Training Accuracy:", 1 - error/len(mnist_train))
# computing the error and accuracy on a test set
error = 0
for (image, label) in mnist_val:
    prob = torch.sigmoid(pigeon(img_to_tensor(image)))
    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):
        error += 1
        
print("Test Error Rate:", error/len(mnist_val))
print("Test Accuracy:", 1 - error/len(mnist_val))
Training Error Rate: 0.312
Training Accuracy: 0.688
Test Error Rate: 0.297
Test Accuracy: 0.7030000000000001
Comment on which of the above changes resulted in the best accuracy on training data? What
accuracy were you able to achieve?
Part (a) -- 3 pt
Answer: change the number of iterations resulted in the best accuracy on training data. I was able
to achieve training accuracy 1.0 by increasing the iterations to 30.
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
import matplotlib.pyplot as plt # for plotting
import torch.optim as optim
torch.manual_seed(1) # set the random seed
# define a 2-layer artificial neural network
class Pigeon(nn.Module):
    def __init__(self):
        super(Pigeon, self).__init__()
        self.layer1 = nn.Linear(28 * 28, 30)
        self.layer2 = nn.Linear(30, 1)
    def forward(self, img):
        flattened = img.view(-1, 28 * 28)
        activation1 = self.layer1(flattened)
        activation1 = F.relu(activation1)
        activation2 = self.layer2(activation1)
        return activation2
pigeon = Pigeon()
# load the data
mnist_data = datasets.MNIST('data', train=True, download=True)
mnist_data = list(mnist_data)
mnist_train = mnist_data[:1000]
mnist_val   = mnist_data[1000:2000]
img_to_tensor = transforms.ToTensor()
      
    
# simplified training code to train `pigeon` on the "small digit recognition" task
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(pigeon.parameters(), lr=0.005, momentum=0.9)
for i in range(30): 
  for (image, label) in mnist_train:
      # actual ground truth: is the digit less than 3?
      actual = torch.tensor(label < 3).reshape([1,1]).type(torch.FloatTensor)
      # pigeon prediction
      out = pigeon(img_to_tensor(image)) # step 1-2
      # update the parameters based on the loss
      loss = criterion(out, actual)      # step 3
      loss.backward()                    # step 4 (compute the updates for each parame
      optimizer.step()                   # step 4 (make the updates for each parameter
      optimizer.zero_grad()              # a clean up step for PyTorch
# computing the error and accuracy on the training set
error = 0
for (image, label) in mnist_train:
    prob = torch.sigmoid(pigeon(img_to_tensor(image)))
    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):
        error += 1
print("Training Error Rate:", error/len(mnist_train))
print("Training Accuracy:", 1 - error/len(mnist_train))
# computing the error and accuracy on a test set
error = 0
for (image, label) in mnist_val:
    prob = torch.sigmoid(pigeon(img_to_tensor(image)))
    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):
        error += 1
        
print("Test Error Rate:", error/len(mnist_val))
print("Test Accuracy:", 1 - error/len(mnist_val))
Training Error Rate: 0.0
Training Accuracy: 1.0
Test Error Rate: 0.059
Test Accuracy: 0.9410000000000001
Comment on which of the above changes resulted in the best accuracy on testing data? What
accuracy were you able to achieve?
Part (b) -- 3 pt
Answer: change the number of iterations and learning rate resulted in the best accuracy on testing
data. I was able to achieve testing accuracy 0.942 by increasing the iterations to 30 and increasing
the learning rate to 0.009.
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
import matplotlib.pyplot as plt # for plotting
import torch.optim as optim
torch.manual_seed(1) # set the random seed
# define a 2-layer artificial neural network
class Pigeon(nn.Module):
    def __init__(self):
        super(Pigeon, self).__init__()
        self.layer1 = nn.Linear(28 * 28, 30)
        self.layer2 = nn.Linear(30, 1)
    def forward(self, img):
        flattened = img.view(-1, 28 * 28)
        activation1 = self.layer1(flattened)
        activation1 = F.relu(activation1)
        activation2 = self.layer2(activation1)
        return activation2
pigeon = Pigeon()
# load the data
mnist_data = datasets.MNIST('data', train=True, download=True)
mnist_data = list(mnist_data)
mnist_train = mnist_data[:1000]
mnist_val   = mnist_data[1000:2000]
img_to_tensor = transforms.ToTensor()
      
    
# simplified training code to train `pigeon` on the "small digit recognition" task
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(pigeon.parameters(), lr=0.009, momentum=0.9)
for i in range(30): 
  for (image, label) in mnist_train:
      # actual ground truth: is the digit less than 3?
      actual = torch.tensor(label < 3).reshape([1,1]).type(torch.FloatTensor)
      # pigeon prediction
      out = pigeon(img_to_tensor(image)) # step 1-2
      # update the parameters based on the loss
      loss = criterion(out, actual)      # step 3
      loss.backward()                    # step 4 (compute the updates for each parame
      optimizer.step()                   # step 4 (make the updates for each parameter
      optimizer.zero_grad()              # a clean up step for PyTorch
# computing the error and accuracy on the training set
error = 0
for (image, label) in mnist_train:
    prob = torch.sigmoid(pigeon(img_to_tensor(image)))
    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):
        error += 1
print("Training Error Rate:", error/len(mnist_train))
print("Training Accuracy:", 1 - error/len(mnist_train))
# computing the error and accuracy on a test set
error = 0
for (image, label) in mnist_val:
    prob = torch.sigmoid(pigeon(img_to_tensor(image)))
    if (prob < 0.5 and label < 3) or (prob >= 0.5 and label >= 3):
        error += 1
print("Test Error Rate:", error/len(mnist_val))
print("Test Accuracy:", 1 - error/len(mnist_val))
Training Error Rate: 0.0
Training Accuracy: 1.0
Test Error Rate: 0.058
Test Accuracy: 0.942
Which model hyperparameters should you use, the ones from (a) or (b)?
Part (c) -- 4 pt
Answer:
I should use the ones from (b). Traning data is used to build the model, while testing data is to
validate the model built. If ANN is trained by training data, the raining accuracy can overt since the
ANN can be trained too well. It keeps yieded 1.0 training accuracy after 30 iterations, even the
learning rate increased to 0.009. Meanwhile, the test accuracy increased to 0.942. Also, the testing
data is different data that never used to train ANN. The prediction is based on new different data
instead of the same data that for training.

More products