$30
Lab 3-D41
ECE 449 - Intelligent Systems Engineering
Lab 3: Neural Networks, Perceptrons and
Hyperparameters
1. Objective:
The objective of this lab is to gain familiarity with the concepts of linear models and to gain a feeling
for how changing hyperparameters affects the performance of the model. The exercises in the lab will
help bring to light the weaknesses and strengths of linear models and how to work with them.
2. Expectation:
Complete the pre-lab, and hand it in before the lab starts. A formal lab report is required for this lab,
which will be the completed version of this notebook. There is a marking guide at the end of the lab
manual. If figures are required, label all the axies and provide a legend when appropriate. An
abstract, introduction, and conclusion are required as well, for which cells are provided at the end of
the notebook. The abstract should be a brief description of the topic, the introduction a description of
the goals of the lab, and the conclusion a summary of what you learned, what you found difficult, and
your own ideas and observations.
3. Pre lab:
1. Read through the code. What kind of models will be used in this lab?
2. Explain why the differentiability of an activation function plays an important role in the learning of
these neural networks. Why might the linear activation function be a poor choice in some cases?
4. Introduction:
During this lab, you will be performing a mix of 2 common machine learning tasks: regression and
classification. Before defining these tasks mathematically, it is important to understand the core
process behind the two tasks. Regression is defined as reasoning backwards. In the context of
machine learning, regression is about predicting the future based on the past. Classification is defined
as the act of arranging things based on their properties. These definitions give insight into how these
problems are broken down.
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 2/22
Suppose you, a human being, want to make a prediction. What are the steps that you take: You first
collect data on the subject that you want to predict. Then you weigh the relevance of each piece of
information that you get, attributing varying levels of importance to each piece of data. Once you have
enough relevant data, you become certain of an outcome. Finally, you act on that certainty. This
pipeline is shown in the figure below:
alt text
Now the classification task, one usually begins this with a topic that they want to classify. This is
usually accompanied by a list of candidate categories, one of which is the correct category for the
topic in question. Since classification relies on properties of the topic, the next step is to list the
notable features that may help in the discerning the correct category. Similarly to the prediction, the
relevance of each piece of information is then weighed and a decision is made when you have
enough data. Once this is done, the guess is compared to reality in order to judge if the classification
was correct. This pipeline is shown in the figure below:
alt text
The mathematical model that we will use in this lab to describe such behaviors are called linear
models. The simplest linear model is the perceptron.
A perceptron is a simple type of neural network that uses supervised learning, where the expected
values, or targets, are provided to the network in addition to the inputs. The network operates by first
calculating the weighted sum of its inputs (and bias). These weights are typically randomly assigned.
Then, the sum is processed with an activation function that "squashes" the summed total to a smaller
range, such as (0, 1).
The perceptron's way of reasoning is formulated in the same way as a human's. It takes in input data
in the form of the x vector. It then weighs the relevance of each input using the mathematical
operation of multiplication. Following this, the total sum of all weighted inputs is passed through an
activation function, analogous to the moment that you have enough data to confirm an outcome. Then
they output a value, y, that is effectively the action that you take based on your prediction.
The math behind the perceptron's operations is described by the following formulae:
Training a perceptron involves calculating the error by taking the difference between the targets and
the actual outputs. This allows it to determine how to update its weights such that a closer output
value to the target is obtained.
Perceptrons are commonly employed to solve two-class classification problems where the classes are
linearly separable. However, the applications for this are evidently very limited. Therefore, a more
practical extension of the perceptron is the multi-layer perceptron, which adds extra hidden layer(s)
between the inputs and outputs to yield more flexibility in what the MLP can classify.
The most common learning algorithm used is backpropagation (BP). It employs gradient descent in
an attempt to minimize the squared error between the network outputs and the targets.
𝑡𝑜𝑡 = ∑ + 𝜃 =
𝑡=1
𝑛
𝑥𝑖𝑤𝑖 ∑
𝑡=0
𝑛
𝑥𝑖𝑤𝑖
𝑜 = 𝑓𝑎𝑐𝑡(𝑡𝑜𝑡)
𝐸 = [ (𝑘) − (𝑘)
1
2 ∑
𝑘=1
𝑛
∑
𝑖=1
𝑞
𝑡𝑖 𝑜𝑖 ]
2
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 3/22
This error value is propagated backwards through the network, and small changes are made to the
weights in each layer. Using a gradient descent approach, the weights in the network are updated as
follows:
where is the learning rate. The network is trained using the same data, multiple times in
epochs. Typically, this continues until the network has reached a convergence point that is defined by
the user through a tolerance value. For the case of this lab, the tolerance value is ignored and training
will continue until the specified number of epochs is reached. More details of backpropagation can be
found in the lecture notes.
Neural networks have two types of parameters that affect the performance of the network,
parameters and hyperparameters. Parameters have to do with the characteristics that the model
learns during the training process. Hyperparameters are values that are set before training begins.
The parameters of linear models are the weights. The hyperparameters include:
Learning algorithm
Loss function
Learning rate
Activation function
Hyperparameter selection is very important in the field of AI in general. The performance of the
learning systems that are deployed relies hevily on the selection of hyperparameters and some
advances in the field have even been soley due to changes in hyperparametes. More on
hyperparameters can be found in the lecture notes and in the literature.
5. Experimental Procedure:
Δ𝑤 = −𝜂∇ = −𝜂
(𝑙) 𝑤
(𝑙) ∂𝐸(𝑘)
∂𝑤(𝑙)
𝜂 > 0
Exercise 1: Perceptrons and their limitations
The objective of this exercise is to show how adding depth to the network makes it learn better. This
exercise will involve running the following cells and examining the data. This exercise will showcase
the classification task and it will be performed on the Iris dataset. Also, ensure that all files within "Lab
3 Resources" is placed in the same directory as this Jupyter notebook.
Run the following cell to import all the required libraries.
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 4/22
In [2]:
The Iris dataset: This dataset contains data points on three different species of Iris, a type of flower.
The dataset has 50 entries for each of the species and has 4 different features:
1. Sepal Length
2. Sepal Width
3. Petal Length
4. Petal Width
This dataset has one obvious class that is separate from a cluster of the other two classes, making it
a typical exercise in classification for machine learning. The next cell loads the dataset into 2
variables, one for the features and one for the classes.
In [3]:
%matplotlib inline
import numpy as np # General math operations
import scipy.io as sio # Loads .mat variables
import matplotlib.pyplot as plt # Data visualization
from sklearn.linear_model import Perceptron # Perceptron toolbox
from sklearn.neural_network import MLPRegressor # MLP toolbox
import seaborn as sns
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import datasets
from sklearn.neural_network import MLPClassifier
from sklearn import preprocessing
from sklearn import linear_model # Linear models
from sklearn.tree import DecisionTreeRegressor
import warnings
warnings.filterwarnings('ignore')
# load the data
iris = datasets.load_iris()
Y = iris.target
X = iris.data
# set up the pandas dataframes
X_df = pd.DataFrame(X, columns = ['Sepal length','Sepal width', 'Petal lengt
Y_df = pd.DataFrame(Y, columns = ['Iris class'])
# this code changes the class labels from numerical values to strings
Y_df = Y_df.replace({
0:'Setosa',
1:'Virginica',
2:'Versicolor'
})
#Joins the two dataframes into a single data frame for ease of use
Z_df = X_df.join(Y_df)
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 5/22
Visualizing the data is a important tool for data exploration. Visualizing the data will allow you to
intuitively understand obvious relationships that are present in the data, even before you begin to
analyse it. The next cell will plot all of the features against each other.
In [4]:
This type of plot is called a pairplot. It plots each feature against all other features including itself; this
is done for all four features. This results in 2 different types of plots being present in the plot, scatter
and histogram.
The following cell will train a perceptron on the features and labels and display the result on the test
set in a pairplot.
# show the data using seaborn
sns.set(style='dark', palette= 'deep')
pair = sns.pairplot(Z_df, hue = 'Iris class')
plt.show()
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 6/22
In [5]: RANDOM_SEED = 6
xTrain, xTest, yTrain, yTest = train_test_split(X_df, Y_df, test_size =0.3,\
random_state=RANDOM_SEED)
#plot the testing data
test_df = xTest.join(yTest)
# print(test_df.head)
# perceptron training
percep = Perceptron(max_iter = 1000)
percep.fit(xTrain, yTrain)
prediction = percep.predict(xTest)
# print(prediction)
# display the classifiers performance
prediction_df = pd.DataFrame(prediction, columns=['Predicted Iris class'], i
# print(prediction_df.head)
prediction_df_index_df = prediction_df.join(xTest)
# print(prediction_df_index_df.head)
pair = sns.pairplot(prediction_df_index_df, hue = 'Predicted Iris class')
#pair_test = sns.pairplot(test_df, hue ='Iris class')
plt.show()
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 7/22
In [5]:
Question 1:
Comment on the performance of the perceptron, how well does it handle the task?
The next cell will retrain the perceptron but with different parameters. This MLP consists of 2 hidden
layers: one with 8 neurons and a second one with 3
pair_test = sns.pairplot(test_df, hue ='Iris class') #test data from the dat
The scatterplots look the same, but the histograms seem slightly different.
On other runs, sometimes the predicted class looks closer to the test
data, but for the most part though the histograms are noticeably different.
Thus, we can conclude that the perceptron handles the task okay, but not
ideally. This is likely because the classification problem is non-linear in
nature, and perceptrons are only able to solve classification problems that
are linearly separable.
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 8/22
In [7]:
Question 2:
# change the layers, retrain the mlp
cls = MLPClassifier(solver = 'sgd' ,activation = 'relu' , \
hidden_layer_sizes = (8,3,), max_iter = 100000)
for i in range(0,5):
cls.fit(xTrain, yTrain)
mlp_z = cls.predict(xTest)
mlp_z.reshape(-1,1)
cls_df = pd.DataFrame(mlp_z, columns = ["Mlp prediction"], index=xTest.index
# cls_df_index = cls_df.join(Test_index_df).set_index('Test index')
# cls_df_index.index.name = None
# Join with the test_index frame
cls_prediction_df = cls_df.join(xTest)
# Display the MLP classifier
cls_pairplot = sns.pairplot(cls_prediction_df, hue = 'Mlp prediction')
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 9/22
Answer the following questions:
How does the Mlp compare to the perceptron in the classification task?
Did it do well
Was it able to classify the three classes?
What happens when you run it again?
Can you offer a explanation for what happened?
Fill the box below with your answer:
Exercise 2: Getting your hands dirty with regression
NOTE: The code in this exercise is computationally intensive and may require up to 5 minutes
to finish running.
In order to improve the energy management of monitoring stations, a team of climatologists would like
to implement a more robust strategy of predicting the solar energy available for the next day, based
on the current day's atmospheric pressure. They plan to test this with a station situated in Moxee, and
are designing a multi-layer perceptron that will be trained with the previous year's worth of Moxee
data. They have access to the following values:
Inputs: Pressure values for each hour, along with the absolute differences between them
Targets: Recorded solar energy for the day after
The individual who was in charge of this project before had created a traditional machine learning
approach to predict the solar energy availiabilty of the next day. The individual recently retired and
you have been brought on to the team to try to implement a more accurate system. You find some
code that was left over that uses a MLP. The MLP is initially formed using one hidden layer of 50
neurons, a logistic sigmoid activation function, and a total of 500 iterations. Once it is trained, the
MLP is used to predict the results of both the training cases and new test cases. As a measure of
accuracy, the root mean square error (RMSE) is displayed after inputting data to the MLP.
First, read through the code to understand what results it produces, and then run the script.
When it successfully classifies 3 classes, it seems to perform a lot better
than the perceptron. The output is indistinguishable from the test data.
This is likely because the classification problem is non-linear, and tht
perceptrons are only able to solve classification problems that are
linearly seperable. In contrast, the MLP performs a lot better, since it
is able to handle non-linear classification problems.
Sometimes, however, it it only able to classify 1 class, or 2 instead of
all three.
This changes when it is run multiple times.
The reason for the change is likely because of the random seed.
Sometimes it it able to classify everything with no problem, however,
depending on the random seed, it may not be able to correctly classify
everything based on the training data.
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 10/22
Question 1:
Your objective is to play with the parameters of the regressor to see if you can beat the decision
tree. There are parameters that you can change to try to beat it. You can change:
Size of the Hidden Layers: between 1 and 50
Activation Function:
Identity
Logistic
tanh
relu
Number of Iterations, to different values (both lower and higher): Between 1 and 1000
Comment on how this affects the results. Include plots of your final results (using any one of your
values for the parameters). Describe some of the tradeoffs associated with both lowering and
raising the number of iterations.
In order to determine the accuracy of the methods, you will be using RMSE
𝑅𝑀𝑆𝐸 =
∑ (𝐴𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒𝑑 − 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑)
𝑛
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
√
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 11/22
In [8]: # Obtain training data
moxeeData = sio.loadmat('moxeetrainingdata.mat') # Load variables from th
trainingInputs = moxeeData['pressureData'] # Pressure values and di
trainingTargets = moxeeData['dataEstimate'] # Estimate of incoming s
# Preprocess the training inputs and targets
iScaler = preprocessing.StandardScaler() # Scaler that removes the mean a
scaledTrainingInputs = iScaler.fit_transform(trainingInputs) # Fit and scal
tScaler = preprocessing.StandardScaler()
scaledTrainingTargets = tScaler.fit_transform(trainingTargets)
# Create the multilayer perceptron.
# This is where you will be modifying the regressor to try to beat the decis
mlp = MLPRegressor(
hidden_layer_sizes = (25,), # One hidden layer with 50 neurons
activation = 'logistic', # Logistic sigmoid activation function
solver = 'sgd', # Gradient descent
learning_rate_init = 0.01 ,# Initial learning rate
)
#
############################################################### Create the d
dt_reg = DecisionTreeRegressor(criterion='mse', max_depth = 10)
dt_reg.fit(scaledTrainingInputs, scaledTrainingTargets)
### MODIFY THE VALUE BELOW ###
noIterations = 512 # Number of iterations (epochs) for which the MLP trains
### MODIFY THE VALUE ABOVE ###
trainingError = np.zeros(noIterations) # Initialize array to hold training
# Train the MLP for the specified number of iterations
for i in range(noIterations):
mlp.partial_fit(scaledTrainingInputs, np.ravel(scaledTrainingTargets)) #
currentOutputs = mlp.predict(scaledTrainingInputs) # Obtain the outputs
trainingError[i] = np.sum((scaledTrainingTargets.T - currentOutputs) **
# Plot the error curve
plt.figure(figsize=(10,6))
ErrorHandle ,= plt.plot(range(noIterations), trainingError, label = 'Error 5
plt.xlabel('Epoch')
plt.ylabel('Error')
plt.title('Training Error of the MLP for Every Epoch')
plt.legend(handles = [ErrorHandle])
plt.show()
# Obtain test data
testdataset = sio.loadmat('moxeetestdata.mat')
testInputs = testdataset['testInputs']
testTargets = testdataset['testTargets']
scaledTestInputs = iScaler.transform(testInputs) # Scale the test inputs
# Predict incoming solar energy from the training data and the test cases
scaledTrainingOutputs = mlp.predict(scaledTrainingInputs)
scaledTestOutputs = mlp.predict(scaledTestInputs)
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 12/22
#################################################################### Predict
scaledTreeTrainingOutputs = dt_reg.predict(scaledTrainingInputs)
scaledTreeTestOutputs = dt_reg.predict(scaledTestInputs)
# Transform the outputs back to the original values
trainingOutputs = tScaler.inverse_transform(scaledTrainingOutputs)
testOutputs = tScaler.inverse_transform(scaledTestOutputs)
## DT outputs
treeTrainingOutputs = tScaler.inverse_transform(scaledTreeTrainingOutputs) #
treeTestingOutputs = tScaler.inverse_transform(scaledTreeTestOutputs)
# Calculate and display training and test root mean square error (RMSE)
trainingRMSE = np.sqrt(np.sum((trainingOutputs - trainingTargets[:, 0]) ** 2
testRMSE = np.sqrt(np.sum((testOutputs - testTargets[:, 0]) ** 2) / len(test
## need to add this for the decision tree
trainingTreeRMSE = np.sqrt(np.sum((treeTrainingOutputs - trainingTargets[:,
testTreeRMSE = np.sqrt(np.sum((treeTestingOutputs - testTargets[:, 0]) ** 2)
print("Training RMSE:", trainingRMSE, "MJ/m^2")
print("Test RMSE:", testRMSE, "MJ/m^2")
##################################################################### Print
print("Decision Tree training RMSE:", trainingTreeRMSE, 'MJ/m^2')
print("Decision Tree Test RMSE:", testTreeRMSE, 'MJ/m^2')
day = np.array(range(1, len(testTargets) + 1))
# Plot training targets vs. training outputs
plt.figure(figsize=(10,6))
trainingTargetHandle ,= plt.plot(day, trainingTargets / 1000000, label = 'Ta
trainingOutputHandle ,= plt.plot(day, trainingOutputs / 1000000, label = 'Ou
plt.xlabel('Day')
plt.ylabel(r'Incoming Solar Energy [$MJ / m^2$]')
plt.title('Comparison of MLP Training Targets and Outputs')
plt.legend(handles = [trainingTargetHandle, trainingOutputHandle])
plt.show()
# Plot test targets vs. test outputs -- student
plt.figure(figsize=(10,6))
testTargetHandle ,= plt.plot(day, testTargets / 1000000, label = 'Target val
testOutputHandle ,= plt.plot(day, testOutputs / 1000000, label = 'Outputs 50
plt.xlabel('Day')
plt.ylabel(r'Incoming Solar Energy [$MJ / m^2$]')
plt.title('Comparison of MLP Test Targets and Outputs')
plt.legend(handles = [testTargetHandle, testOutputHandle])
plt.show()
###################################################################### Plot
plt.figure(figsize=(10,6))
testTreeTargetHandle, = plt.plot(day, testTargets / 1000000, label = 'Target
testTreeOutputHandle, = plt.plot(day, treeTestingOutputs / 1000000, label =
plt.xlabel('Day')
plt.ylabel(r'Incoming Solar Energy [$MJ / m^2$]')
plt.title('Comparison of Decision Tree Test Targets and Outputs')
plt.legend(handles = [testTreeTargetHandle, testTreeOutputHandle])
plt.show()
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 13/22
Training RMSE: 2.4766453957406602 MJ/m^2
Test RMSE: 3.1155836162998276 MJ/m^2
Decision Tree training RMSE: 0.18092649714730089 MJ/m^2
Decision Tree Test RMSE: 4.049651411522571 MJ/m^2
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 14/22
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 15/22
Fill the box below with your answer for question 1:
Repeat the same process but against this SVM:
During a coffee break, you get talking to one of your friends from a different department. He
mentioned that at one point there was an intern that was also tasked with predicting the solar energy
and they tried a Support Vector machine.
When you tell your superiors, they suggest that you try to beat this interns work as well since it seems
to work better than the Decision tree that your predecessor left.
Question 2
Your objective again is to play with the parameters of the regressor to see if you can beat the Support
Vector Machine. There are parameters that you can change to try to beat it. You can change:
Size of the Hidden Layers: between 1 and 50
I chose a hidden layers size of 25
with the logistic activation function, and
512 iterations. This was enought to beat the decision tree against the
test data.
Having too many iterations meant that the neural network would overfit to
the training data, resulting in poorer performance against the test data.
As for changing the size of the hidden layers, too few neurons in a single
hidden layer can result in a phenomenon known as underfitting.
Using too many neurons in the hidden layers, however, can result in
overfitting. This happens when the hidden layer size is too large because
the neural network has too much information processing capacity, while
information in the training data is relatively limited to the size of the
hidden layer, so it is not enought to train the neurons in the hidden
layer. Additionally, more neurons mean that the network will take longer
to train.
The activation function also had a significant effect on the neural
network's performance. In some cases,
for example, the logistic sigmoid in this case seemed to work the best. In
general, it can suffer from the vanishing gradient, but compared to the
other actvation functions, this one seemed to yield the best results. The
sigmoid tends to bring activations to either side of the curve, so maybe
that is what helped it perform better here. Tanh is similar to sigmoid,
however maybe it did not perform as well, since its derivtive is steeper.
Relu is usually good for fast convergence, but that did not seem to be as
helpful here. It should be noted that sigmoid is usually good for
classification problems, which this is.
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 16/22
Activation Function:
Identity
Logistic
tanh
relu
Number of Iterations, to different values (both lower and higher): Between 1 and 1000
Comment on how this affects the results. Include plots of your final results (using any one of your
values for the parameters). Describe some of the tradeoffs associated with both lowering and raising
the number of iterations.
In order to determine the accuracy of the methods, you will be using the RMSE again.
𝑅𝑀𝑆𝐸 =
∑ (𝐴𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒𝑑 − 𝑜𝑏𝑠𝑒𝑟𝑣𝑒𝑑)
𝑛
⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
√
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 17/22
In [9]: #INITIALIZE
from sklearn.svm import LinearSVR
svm_clf = LinearSVR(C=0.6, loss='squared_epsilon_insensitive')
svm_clf.fit(scaledTrainingInputs, np.ravel(scaledTrainingTargets))
# PREDICT the training outputs and the test outputs
scaledTrainingOutputs = svm_clf.predict(scaledTrainingInputs)
scaledTestOutputs = svm_clf.predict(scaledTestInputs)
trainingOutputs = tScaler.inverse_transform(scaledTrainingOutputs)
testOutputs = tScaler.inverse_transform(scaledTestOutputs)
#Calculate and display training and test root mean square error (RMSE)
trainingsvmRMSE = np.sqrt(np.sum((trainingOutputs - trainingTargets[:, 0]) *
testsvmRMSE = np.sqrt(np.sum((testOutputs - testTargets[:, 0]) ** 2) / len(t
#### PLOTTING
plt.rcParams["figure.figsize"] = (10,6)
day = np.array(range(1, len(testTargets) + 1))
testTargetHandle, = plt.plot(day, testTargets / 1000000, label = 'Target Val
testsvmOutputHandle, = plt.plot(day, testOutputs / 1000000, label = 'SVM Pre
plt.xlabel('Day')
plt.ylabel(r'Incoming Solar Energy [$MJ / m^2$]')
plt.title('Comparison of Prediction Targets and SVM Predictions')
plt.legend(handles = [testTargetHandle, testsvmOutputHandle])
plt.show()
print("Support Vector Machine RMSE values and Plots")
print("Training RMSE:", trainingsvmRMSE, "MJ/m^2")
print("Test RMSE:", testsvmRMSE, "MJ/m^2")
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 18/22
Support Vector Machine RMSE values and Plots
Training RMSE: 2.986111859740932 MJ/m^2
Test RMSE: 2.9869590423558248 MJ/m^2
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 19/22
In [21]: # Modify this neural network
mlp = MLPRegressor(
hidden_layer_sizes = (1,), # One hidden layer with 50 neurons
activation = 'relu', # Logistic sigmoid activation function
solver = 'sgd', # Gradient descent
learning_rate_init = 0.01 ,# Initial learning rate
)
#
############################################################### Create the d
dt_reg = DecisionTreeRegressor(criterion='mse', max_depth = 10)
dt_reg.fit(scaledTrainingInputs, scaledTrainingTargets)
### MODIFY THE VALUE BELOW ###
noIterations = 509 # Number of iterations (epochs) for which the MLP trains
### MODIFY THE VALUE ABOVE ###
trainingError = np.zeros(noIterations) # Initialize array to hold training
# Train the MLP for the specified number of iterations
for i in range(noIterations):
mlp.partial_fit(scaledTrainingInputs, np.ravel(scaledTrainingTargets)) #
currentOutputs = mlp.predict(scaledTrainingInputs) # Obtain the outputs
trainingError[i] = np.sum((scaledTrainingTargets.T - currentOutputs) **
# Predict
scaledTrainingOutputs = mlp.predict(scaledTrainingInputs)
scaledTestOutputs = mlp.predict(scaledTestInputs)
#Training output conversion
trainingOutputs = tScaler.inverse_transform(scaledTrainingOutputs)
testOutputs = tScaler.inverse_transform(scaledTestOutputs)
#RMSE calculation
trainingRMSE = np.sqrt(np.sum((trainingOutputs - trainingTargets[:, 0]) ** 2
testRMSE = np.sqrt(np.sum((testOutputs - testTargets[:, 0]) ** 2) / len(test
# Plot the error curve
plt.figure(figsize=(10,6))
ErrorHandle ,= plt.plot(range(noIterations), trainingError, label = 'Error 5
plt.xlabel('Epoch')
plt.ylabel('Error')
plt.title('Training Error of the MLP for Every Epoch')
plt.legend(handles = [ErrorHandle])
plt.show()
print("MLP Training and test RMSE values:")
print("Training RMSE: " , trainingRMSE)
print("Test RMSE: " , testRMSE)
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 20/22
Fill the box below with your answer for question 2:
MLP Training and test RMSE values:
Training RMSE: 2.9041504000608227
Test RMSE: 2.904108338546406
Using a hidden layer size of 1,
the relu activation function,
and 509 iterations, the SVM was beat.
Having too many iterations meant that the neural network would overfit to
the training data, resulting in poorer performance against the test data.
As for changing the size of the hidden layers, too few neurons in a single
hidden layer can result in a phenomenon known as underfitting.
Using too many neurons in the hidden layers, however, can result in
overfitting. This happens when the hidden layer size is too large because
the neural network has too much information processing capacity, while
information in the training data is relatively limited to the size of the
hidden layer, so it is not enought to train the neurons in the hidden
layer. Additionally, more neurons mean that the network will take longer
to train. Interestingly, we seemed to get better results with a hidden
layer of size 1. This seems to suggest that the problem is better solved
when compressing the data into a lower dimensional form.
The activation function also had a significant effect on the neural
network's performance. In this case, the relu activation function seemed
to work best. Perhaps faster convergence helped the relu function work
better in this case than the tanh, sigmoid, and identity functions.
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 21/22
Abstract
The purpose of this lab was to become familiar with the concepts of linear models and to gain a
feeling for how changing hyperparameters might affect the performance of a model. Specifically, in
this lab we look at the perceptron. In exercise 2, we also looked at the MLP, or multilayer perceptron,
and compared its performance to a single perceptron. The parameters tweaked were the size of the
hidden layers, the activation function, and the number of iterations for training. By tweaking these
parameters, a decision tree was beat, and also a SVM, or support vector machine. The exercises in
the lab helped bring to light the weaknesses and strengths of linear models and how to work with
them, and to help get a feel for how changing these parameters can affect the performance of the
neural network.
Introduction
In the lab, two common machine learning tasks were performed: Regression and Classifiction. The
process of reasoning backwards is known as regression. In this context, we are attempting to predict
the future based on past inputs. Meanwhile, classification is the act of arranging things based on their
properties. We are essentially putting into practice something we naturally do as humans, which is
predicting the future based past experiences. To make this happen, we used the linear model known
as a perceptron. A perceptrion is an extremely simple type of neural network that learns through the
process known as supervised learning, where the inputs and expected outputs are given to the
network. The network produces an output using a weighted sum of its inputs, and if the output does
not match the target, the weights are adjusted. To train the neural network, the error is calculated by
taking the diference between the targets and the output. Perceptrons are generally good at
classification problems that are linearly seperable. Typically, these problems are not too complex, so
a multilayer perceptron, or MLP can be used. A technique known as gradient descent is used to
minimize the error, which is a part of backpropagation. Parameters and hyperparameters can be
tweaked to affect the performance of the network. These include things like the learning algorithm,
loss function, learning rate, and activation function The parameters tweaked in this lab were the size
of the hidden layers, the activation function, and the number of iterations for training. Differentiability
is important, so that we can find a direction to minimize error in the error space. When the activation
function is differentiable, we can then go in the direction which is the negative gradient in the error
space to minimize error. This allows us to backpropagate the model's error when training so that the
weights can be optimized. A linear activation function is usually a poor choice, since the output would
then be just a linear transformation of the input. In other words, it would just as easily be represented
as a matrix multiplication, so the outputs would generally not be very interesting.
Conclusion
In this lab, we experimented with perceptrons and multilayer perceptrons. We looked at the
performance of a perceptron with a nonlinear classification problem, and then we looked at the
performance of a multilayer perceptron against the same classification problem. Additionally, we
28/10/2019 Lab 3-D41
https://cybera.syzygy.ca/jupyter/user/awoosare/notebooks/Lab 3-D41.ipynb# 22/22
tweaked certain parameters of an MLP to beat a decision tree. An MLP was also used to beat the
accuracy of a SVM, or support vector machine. The parameters tweaked in this lab were the size of
the hidden layers, the activation function, and the number of iterations for training.
Compared to the rest of the lab, Question 2 was relatively challengingr as it was (relatively)
computationally expensive, and we had to carefully change the parameters knowing how they might
affect the output. With the model being somewhat computationally expensive, it would take
approximately five minutes to complete each simulation. This, in combination with tweaking each
parameter meant that there were a lot of configurations to try, but no time to brute force a solution
which beats the SVM. Hence, special thought and care had to be used in tweaking the parameters.
For example, having too many iterations meant that the neural network would overfit to the training
data, resulting in poorer performance against the test data. Changing the size of the hidden layers.
Too few neurons in a single hidden layer can result in a phenomenon known as underfitting. Using
too many neurons in the hidden layers, however, can result in other problems. One of these problems
is known as overfitting. This happens when the hidden layer size is too large because the neural
network has too much information processing capacity, while information in the training data is
relatively limited to the size of the hidden layer, so it is not enought to train the neurons in the hidden
layer. Additionally, more neurons mean that the network will take longer to train. The activation
function also had a significant effect on the neural network's performance. In some cases, for
example, the sigmoid may work well, while in other cases it may suffer from the vanishing gradient
problem. In conclusion, this lab served as a solid introduction to linear machine learning models, and
allowed us to experiment with changing different parameters and getting a feel for how they can affect
the performance of a neural network. We also got a feel for how MLPs are more powerful for solving
complex, nonlinear classification problems.
Lab 3 Marking Guide
𝐄𝐱𝐞𝐫𝐜𝐢𝐬𝐞
1
2
𝐈𝐭𝐞𝐦
𝑃𝑟𝑒 − 𝑙𝑎𝑏
𝐴𝑏𝑠𝑡𝑟𝑎𝑐𝑡
𝐼𝑛𝑡𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑜𝑛
𝐶𝑜𝑛𝑐𝑙𝑢𝑠𝑖𝑜𝑛
𝐶𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛
𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛
𝐓𝐎𝐓𝐀𝐋
𝐓𝐨𝐭𝐚𝐥 𝐌𝐚𝐫𝐤𝐬
3
1
1
2
20
20
47
𝐄𝐚𝐫𝐧𝐞𝐝 𝐌𝐚𝐫𝐤𝐬