Starting from:

$30

Stat 437 HW4

Stat 437 HW4
Your Name (Your student ID)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.6 v dplyr 1.0.4
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
##
## Attaching package: ’reshape2’
## The following object is masked from ’package:tidyr’:
##
## smiths
##
## Attaching package: ’MASS’
## The following object is masked from ’package:dplyr’:
##
## select
General rule
Due by 11:59pm Pacific Standard Time, March 30, 2021. Please show your work and submit your
computer codes in order to get points. Providing correct answers without supporting details does
not receive full credits. This HW covers
• Bayes classifier
• kNN classifier
• Discriminant analysis
You DO NOT have to submit your HW answers using typesetting software. However, your answers
must be legible for grading. Please upload your answers to the course space. This is a relatively
long HW. So, please complete it very carefully.
For exercises on the Text, there might be solutions to them that are posted online. Please do not
plagiarize those solutions.
1
Conceptual exercises: I (Bayes classifier)
1. This exercise is on Bayes theorem and Bayes classifier.
1.1
State clearly the definition of the 0-1 loss function. Can this function be used in multi-class
classification problems?
1.1 Response
State clearly the definition of the 0-1 loss function. From Lecture Notes 4, the definition of
the 0-1 loss function is as follows:
• Let Gˆ(x) be the estimated class label for an observation x from X, i.e., Gˆ(x) = 1 or 2
• For k, l ∈ {1, 2}, the 0-1 loss L is defined as:
L(k, l) = 0 if k = l, and L(k, l) = 1 if k 6= 1
In other words: if there are classification errors, then the loss will be 1, otherwise, the loss will be
0.
Can this function be used in multi-class classification problems? Yes, the 0-1 loss function
can be utilized in multi-class classification problems.
TODO
1.2
Let Y be the random variable for the class label of a random vector X, such that Y ∈ G = {1, . . . , K}
where K ≥ 2 is the number of classes. Let Yˆ be the estimated class label for X. Given the prior
Pr(Y = k) = πk, k ∈ G on Class k and the conditional density fk(x) of X when it comes from
Class k. Provide the formula to obtain the posterior Pr(Y = k|X = x), which is essentially
the Bayes theorem. What is the Bayes classifier and how does it classify a new observation x0
from X? Is the decision boundary of the Bayes classifier linear or quadratic in X? Explain (but
do not have to mathematically prove) why the Bayes classifier minimizes the expected 0-1 loss.
Note the a proof of the fact that the Bayes classifier minimizes the expected 0-1 loss is given in
“LectureNotes4_notes.pdf”. You should not copy and paste the proof. Instead, please provide the
explanation based on your understanding of the proof.
1.2 Response
Provide the formula to obtain the posterior Pr(Y = k|X = x), which is essentially the
Bayes theorem. Lesson 8 - Lecture Notes 4b
What is the Bayes classifier and how does it classify a new observation x0 from X?
TODO
2
Is the decision boundary of the Bayes classifier linear or quadratic in X? TODO
Explain (but do not have to mathematically prove) why the Bayes classifier minimizes
the expected 0-1 loss. TODO
1.3
If K = 2 in subquestion 1.2), what is the threshold value on Pr(Y = 1|X = x0) that is used by
the Bayes classifier to determine the class label for x0? Suppose you use a different threshold value
on Pr(Y = 1|X = x0) to classify x0, is the corresponding classifier still the Bayes classifier, and is
the corresponding loss function still the 0-1 loss? Explain your answer. Provide a scenario where
to classify an observation a different threshold value is more sensible than the threshold value used
by the Bayes classifier.
1.3 Response
If K = 2 in subquestion 1.2), what is the threshold value on Pr(Y = 1|X = x0) that is
used by the Bayes classifier to determine the class label for x0? TODO
Suppose you use a different threshold value on Pr(Y = 1|X = x0) to classify x0, is the
corresponding classifier still the Bayes classifier, and is the corresponding loss function
still the 0-1 loss? TODO
Provide a scenario where to classify an observation a different threshold value is more
sensible than the threshold value used by the Bayes classifier. TODO
1.4
If K = 2 in subquestion 1.2), π1 = 0.6, f1(x) ∼ Gaussian(0, 1) and f2(x) ∼ Gaussian(2, 1) and
x0 = 1.5. Compute Pr(Y = 1|X = x0) and use the Bayes classifier to classify x0.
1.4 Response
Compute Pr(Y = 1|X = x0) and use the Bayes classifier to classify x0. TODO
Conceptual exercises: II (k-NN classifier)
2. Given the training set T of n observations (x1, y1), . . . ,(xn, yn), where yi
is the class label
of observation xi and yi ∈ G = {1, . . . , K} for K ≥ 2, consider k-NN classifier, where k is the
neighborhood size.
2.1
Describe how the decision boundary (such as its smoothness and shape) of k-NN classifier changes
as k changes.
3
2.1 Response
Describe how the decision boundary (such as its smoothness and shape) of k-NN
classifier changes as k changes. The shape and texture of the decision boundary will become
smoother with less definition as k increases. Because a larger “neighborhood” of points is taken
into consideration, the decision boundary is impacted less by local points and their general shape
will be seen better. TODO
2.2
Explain why the training error of 1-NN classifier is 0. Provide an estimator of the test error of
a classifier and explain why it can be used as such an estimator. Is it true that a large k leads
to a k-NN classifier with smaller test error? Can the test error of a k-NN classifier be equal to
the test error of the Bayes classifier? When k is large and k/n is small, what is a k-NN classifier
approximately estimating?
2.2 Response
Explain why the training error of 1-NN classifier is 0. In the case of 1-NN, only the nearest
point to the point being classified will be taken into account. Because there is only one point, it is
guaranteed to be the supermajority class label, thus having an error of 0. TODO
Provide an estimator of the test error of a classifier and explain why it can be used as
such an estimator.
The test error can be computed by em =
1

m
i=1L(y0i
, yˆ0i). Alternatively, if the test error cannot
be computed (e.g. no test samples), the training error, ?n =
1

n
i=1L(yi
, yˆi), can serve as a good
estimate of the test error because TODO used to estimate test error or 0-1 loss?
https://wsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=4f1a98a7-eeb3-467e-8a53-
acd0005822c0 1:12
TODO
Is it true that a large k leads to a k-NN classifier with smaller test error? A large k will
not necessarily lead to a smaller k-NN test error. If k is too small, the model will be over-fit to
the training samples and will likely not generalize well. Conversely, if k is too large, the model will
likely be too general. Therefore, the optimal value for k requires finding a balance between small
and large neighborhoods.
TODO
Can the test error of a k-NN classifier be equal to the test error of the Bayes classifier?
Yes, given that kNN essentially estimates the unknown conditional probabilities used by the Bayes
classifier, kNN asymptotically approaches the error of Bayes, possibly equaling it.
When k is large and k/n is small, what is a k-NN classifier approximately estimating?
In the case of k being large and k/n being small, the k-nn classifier is effectively calculating the
posterior probability that a point belongs to a given class. This is formally described in the lecture
materials as gˆj (x) ≈ P r(Y = j | X = x).
4
2.3
When there are K ≥ 2 classes, how does a k-NN classifier classify a test observation x0?
2.3 Response
When there are K ≥ 2 classes, how does a k-NN classifier classify a test observation
x0?
For the case where k = 2, based on the content from the lecture materials:
Let Nk(x) be the neighborhood of x consisting of k points xi
in the training set T that are closest
to x.
Compute the average label as: Y¯ (x) = 1
kΣxi∈Nk(x)yi
Specify a threshold β ∈ (0, 1) and estimate the class label of x as Yˆ (x) = 1, if Y¯ (x) β or Yˆ (x) = 0
otherwise.
In other words: Once the model is trained, find the k nearest neighbors to the new point being
classified, and predict the label that is most common among the neighbors.
https://wsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=447d454e-bec9-4f4d-9df2-
acd000582323 3:06
https://wsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=4f1a98a7-eeb3-467e-8a53-
acd0005822c0 13:18
TODO
2.4
When should data be standardized before applying a k-NN classifier? When standardizing data,
do we standardize each observation or each feature?
2.4 Response
When should data be standardized before applying a k-NN classifier?
Data should be standardized when variables contain a wide range of values since k-NN utilizes the
Euclidean distance as a dissimilarity measure which is vulnerable to outliers.
When standardizing data, do we standardize each observation or each feature?
When standardizing data the features are scaled.
2.5
Using your understanding of Example 3 in “LectureNotes4b_notes.pdf”, provide a step-by-step
guide on how to choose an optimal k for k-NN classifier using cross-validation. You can provide
such as guide in the form of “pseudocode” (see, e.g., https://en.wikipedia.org/wiki/Pseudocode for
some details on pseudocode). Suppose the training set has few observations, can you still perform
cross-validation in order to choose an optimal k? Explain your answer. (Hint: for the 2nd part,
think about if having more observations helps better estimate test error.)
5
2.5 Response
Using your understanding of Example 3 in “LectureNotes4b_notes.pdf”, provide a
step-by-step guide on how to choose an optimal k for k-NN classifier using crossvalidation.
Start by defining a range of k-values to test, as well as a number of folds for the cross-validation
process. Then, for each fold, test the determined range of k-values and obtain a cross-validated
error measure. The value of k for which the lowest cross-validated error is produced is then chosen
as the estimated value for k. TODO
Suppose the training set has few observations, can you still perform cross-validation
in order to choose an optimal k?
Cross-validation can still be used to determine k if only a few observations are present; however,
the estimated value of k may not be a good choice if there are few observations, making the method
not recommended in this case.
Conceptual exercises: III (Discriminant analysis)
## 3 Exercise 2 of Section 4.7 of the Text, which starts with “It was stated in the text that
classifying an observation to the class for which (4.12) is largest is equivalent to classifying an
observation to the class for which (4.13) is largest. Prove that this is the case.” (Helpful information
on how to prove this is contained in the lecture video on LDA and “LectureNotes5b_notes.pdf”.)
3 Response
Exercise 2 of Section 4.7 of the Text, which starts with “It was stated in the text
that classifying an observation to the class for which (4.12) is largest is equivalent to
classifying an observation to the class for which (4.13) is largest. Prove that this is
the case.”
TODO
## 4 Exercise 3 of Section 4.7 of the Text, which starts with “This problem relates to the QDA
model, in which the observations within each class are drawn from a normal distribution with a
class specific mean vector and a class specific covariance matrix. We consider the simple case where
p = 1; i.e. there is only one feature.” (Helpful information on how to prove this is contained in the
lecture video on QDA and “LectureNotes5b_notes.pdf”.)
4 Response
Exercise 3 of Section 4.7 of the Text, which starts with “This problem relates to the
QDA model, in which the observations within each class are drawn from a normal
distribution with a class specific mean vector and a class specific covariance matrix.
We consider the simple case where p = 1; i.e. there is only one feature.”
TODO
## 5 Exercise 5 of Section 4.7 of the Text, which starts with “We now examine the differences
between LDA and QDA.” (Hint: for this question, you may also use information from Figures 4.9,
4.10 and 4.11 in the Text.)
6
5 Response
Exercise 5 of Section 4.7 of the Text, which starts with “We now examine the differences between LDA and QDA.”
5 (a)
Given a linear Bayes decision boundary, it is expected LDA will perform best on the test set. For
the training set, it may be possible for QDA to have a lower error rate than LDA, but only if the
model is overfitting to the training data, and where the data is not perfectly linearly separable.
TODO
5 (b)
Given a non-linear Bayes decision boundary, we would expect QDA to perform best (closest approximate the Bayes decision boundary) on both the training and the test.
TODO
5 (c)
As the sample size n increases, one would expect QDA to perform better on anything but a data set
that is not perfectly linearly separable since the linear decision boundary will incur more and more
errors as sample size is increased for non-linearly separable data. In the case of perfectly linearly
separable data, one would expect them to perform the same since a linear decision boundary is
capable of optimally separating the classes.
TODO
5 (d)
False. For two reasons: First, the Bayes decision boundary is considered to be the “gold standard”
which other classification methods approximate based on available information. If the Bayes decision boundary is linear, then the most desirable decision boundary will be linear. Second, QDA
does not produce a linear decision boundary. It may “model” a linear decision boundary by producing a decision boundary that is very nearly linear, but it does not produce a linear boundary;
for this we would use LDA. An example of where QDA will certainly not perform better than LDA
is on data where the classes are perfectly linearly separable. In this case, it is not possible for QDA
to outperform LDA because LDA will be able to fit a boundary which yields 0% error. This may
be rare in practice, but it does serve as a counter example to an answer of “True” for this question.
TODO - SVM vs LDA/QDA? Does perfect linear separability matter?
6. Let Y be the random variable for the class label of a random vector X ∈ R
p
(where p is
the number of features), such that Y ∈ G = {1, . . . , K} and Pr(Y = k) = πk for Class k with
k ∈ G, where K ≥ 2 is the number of classes. Consider the Gaussian mixture model such that
the conditional density of X when it comes from Class k is fk(x) ∼ Gaussian(µk, Σk). Given
the training set T of n observations (x1, y1), . . . ,(xn, yn) on (X, Y ), where yi
is the class label of
observation xi
, do the following:
7
6.1
Provide the MLEs of πk, µk and Σk for each k ∈ G respectively for the case where all Σk’s are
equal and for the case where not all Σk’s are equal. When p n, is the MLE of Σk still accurate?
If not, recommend a different estimator for estimating Σk and provide details on this estimator.
6.1 Response
Provide the MLEs of πk, µk and Σk for each k ∈ G respectively for the case where all
Σk’s are equal and for the case where not all Σk’s are equal.
TODO
When p n, is the MLE of Σk still accurate? If not, recommend a different estimator
for estimating Σk and provide details on this estimator.
TODO
6.2
Assume p = 2 and K = 2 and k = 1. For the density fk(x) ∼ Gaussian(µk, Σk), what shape do
its contours take, and how does Σk control the shape of these contours? How do you check if the
conditional density of X given that it comes from Class k is Gaussian?
6.2 Response
For the density fk(x) ∼ Gaussian(µk, Σk), what shape do its contours take, and how
does Σk control the shape of these contours?
TODO
How do you check if the conditional density of X given that it comes from Class k is
Gaussian?
TODO
6.3
Is it true that discriminant analysis will perform badly if the Gaussian assumption is violated?
(Hint: for this question, you may also use the information provided by Figures 4.10 and 4.11 of
the Text.) Let X = (X1, . . . , Xp)
P , i.e., X1 up to Xp are the feature variables. Can discriminant
analysis be applied to observations of X when some of Xj , j = 1 . . . , p is a discrete variable (such
as a categorical variable)? Explain your answer.
6.3 Response
Is it true that discriminant analysis will perform badly if the Gaussian assumption is
violated?
TODO
8
Can discriminant analysis be applied to observations of X when some of Xj , j = 1 . . . , p
is a discrete variable (such as a categorical variable)?
TODO
6.4
What is a ROC curve, and what is AUC? How is AUC used to gauge the performance of a classifier?
If you apply the same classifier, say, LDA or QDA under the same Gaussian mixture model, to
two data sets that are independently generated from the same data generating process, i.e., that
are independently generated from (X, Y ) for classification problems, and obtain two ROC curves,
would the two ROC curves be quite different? Explain your answer. When there are 3 or more
classes, are the codes provided in the lecture notes able to obtain ROC curves and their AUC’s for
LDA and QDA?
6.4 Response
What is a ROC curve, and what is AUC?
TODO
How is AUC used to gauge the performance of a classifier?
TODO
If you apply the same classifier, say, LDA or QDA under the same Gaussian mixture
model, to two data sets that are independently generated from the same data generating process, i.e., that are independently generated from (X, Y ) for classification
problems, and obtain two ROC curves, would the two ROC curves be quite different?
TODO
When there are 3 or more classes, are the codes provided in the lecture notes able to
obtain ROC curves and their AUC’s for LDA and QDA?
TODO
6.5
Describe the key similarities and differences, respectively, between LDA and logistic regression.
Provide a situation where discriminant analysis can still be sensibly applied but logistic regression
is not well-defined.
6.5 Response
Describe the key similarities and differences, respectively, between LDA and logistic
regression.
https://wsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=4f1a98a7-eeb3-467e-8a53-
acd0005822c0 12:06 ?
TODO
9
Provide a situation where discriminant analysis can still be sensibly applied but logistic
regression is not well-defined.
TODO
Applied exercises: I (k-NN classifier)
7. Please refer to the NYC flight data nycflights13 that has been discussed in the lecture notes and
whose manual can be found at https://cran.r-project.org/web/packages/nycflights13/index.html.
We will use flights, a tibble from nycflights13.
Please use set.seed(123) for the whole of this exercise. Randomly select from flights for each
of the 3 carrier “UA”, “AA” or “DL” 500 observations for the 3 features dep_delay, arr_delay
and distance. Let us try to see if we can use the 3 features to identify if an observation belongs a
specific carrier. The following tasks and questions are based on the extracted observations. Note
that you need to remove rows with na’s from the extracted observations.
# Setting the seed for the chunk
set.seed(123)
# Inspecting the raw data
head(flights)
## # A tibble: 6 x 19
## year month day dep_time sched_dep_time dep_delay arr_time sched_arr_time
## <int <int <int <int <int <dbl <int <int
## 1 2013 1 1 517 515 2 830 819
## 2 2013 1 1 533 529 4 850 830
## 3 2013 1 1 542 540 2 923 850
## 4 2013 1 1 544 545 -1 1004 1022
## 5 2013 1 1 554 600 -6 812 837
## 6 2013 1 1 554 558 -4 740 728
## # ... with 11 more variables: arr_delay <dbl, carrier <chr, flight <int,
## # tailnum <chr, origin <chr, dest <chr, air_time <dbl, distance <dbl,
## # hour <dbl, minute <dbl, time_hour <dttm
# Extracting bulk data, filtering, and removing missing
# values
filtered <- flights %% dplyr::select(carrier, dep_delay, arr_delay,
distance) %% filter(carrier %in% c("UA", "AA", "DL")) %%
na.omit()
# Randomly selecting 500 observations from the bulk data
fl500 <- sample_n(filtered, size = 500, replace = FALSE)
# Inspecting the pre-processed data
head(fl500)
10
## # A tibble: 6 x 4
## carrier dep_delay arr_delay distance
## <chr <dbl <dbl <dbl
## 1 AA -9 -48 733
## 2 UA 83 41 2565
## 3 UA -7 -41 2565
## 4 AA -4 -39 2475
## 5 UA 9 -4 1400
## 6 DL -7 -42 2475
7.1
First, you need to standardize the features since they are on very different scales. Then randomly
split the observations into a training set that contains 70% of the observations and a test set that
contains the remaining observations.
7.1 Code
# Setting the seed for the chunk
set.seed(123)
# Standardizing features and splitting data and label columns
fl500Labels <- fl500 %% dplyr::select(carrier) %% as.data.frame()
fl500Std <- as.data.frame(scale(fl500[, 2:4]))
# Splitting the samples into train and test sets
trainingIndices <- sample(1:nrow(fl500Std), floor(0.7 * nrow(fl500Std)),
replace = FALSE)
testIndices <- (1:nrow(fl500Std))[-trainingIndices]
trainData <- as.data.frame(fl500Std[trainingIndices, ])
trainLabels <- as.data.frame(fl500Labels[trainingIndices, ])
colnames(trainLabels) <- "carrier"
testData <- as.data.frame(fl500Std[testIndices, ])
testLabels <- as.data.frame(fl500Labels[testIndices, ])
colnames(testLabels) <- "carrier"
7.2
Consider the observations as forming 3 classes that are determined by carrier. To the training
set, apply 10 fold cross-validation to k-NN classifier with features arr_delay, dep_delay, and
distance to determine the optimal k from the values {1, . . . , 15}. Apply the optimal k-NN to the
test set, provide the classification table and the overall error rate, and provide visualization of the
classification results. Do you think the error rate is reasonable? Explain your answer. (Hint: you
can follow the strategy provided by Example 3 in “LectureNotes4b_notes.pdf”. )
11
7.2 Code
# Setting chunk seed
set.seed(123)
# 10-folds CV to determine best k from 1-15
m <- 10
folds <- sample(1:m, nrow(trainData), replace = TRUE)
kMax <- 15
testErrors <- matrix(0, nrow = 2, ncol = kMax)
for (k in 1:kMax) {
testError1 <- double(m)
for (s in 1:m) {
trainingTmp <- trainData[folds != s, ]
testTmp <- trainData[folds == s, ]
trainingLabsTmp <- trainLabels[folds != s, ]
testLabsTmp <- trainLabels[folds == s, ]
knntmp <- knn(trainingTmp, testTmp, trainingLabsTmp,
k)
nMissedObs <- sum(1 - as.numeric(knntmp == testLabsTmp))
tError <- nMissedObs/length(testLabsTmp)
testError1[s] <- tError
}
testErrors[, k] <- c(mean(testError1), sd(testError1))
}
colnames(testErrors) <- paste("k=", 1:kMax, sep = "")
rownames(testErrors) <- c("mean(TestError)", "sd(TestError)")
testErrors <- as.data.frame(testErrors)
as.numeric(testErrors[1, ])
## [1] 0.5772902 0.5775851 0.5623222 0.5594703 0.5369742 0.5313338 0.5313181
## [8] 0.5366948 0.5329016 0.5386426 0.5388035 0.5336004 0.5449212 0.5380716
## [15] 0.5372920
hatk <- which(testErrors[1, ] == min(testErrors[1, ]))
hatk
## [1] 7
12
# Applying kNN
knnOut <- knn(trainData, testData, trainLabels$carrier, hatk)
missedObs <- sum(1 - as.numeric(knnOut == testLabels$carrier))
tErrorOpt <- missedObs/length(testLabels$carrier)
# Classification table and error rate
table(knnOut, testLabels$carrier)
##
## knnOut AA DL UA
## AA 2 2 6
## DL 14 29 21
## UA 17 18 41
cat("Error Rate:", tErrorOpt, "\n")
## Error Rate: 0.52
# Visualizing results
testResData <- as.data.frame(testData)
testResData$Carrier <- unlist(testLabels)
testResData$EstimatedCarrier <- unlist(knnOut)
ggplot(testResData, aes(arr_delay, distance)) + geom_point(aes(shape = EstimatedCarrier,
color = Carrier)) + theme_bw() + labs(title = paste0(hatk,
"-NN Results"))
13
−1
0
1
0 2 4
arr_delay
distance
EstimatedCarrier
AA
DL
UA
Carrier
AA
DL
UA
7−NN Results
7.2 Response
Do you think the error rate is reasonable? No, in this case the model did not perform
adequately having produced an error rate of 52%.
7.3
Note that your have standardized the features arr_delay, dep_delay, and distance. However,
with the unstandardized features, you would surely know that none of them follows a Gaussian
distribution no matter with respect to which class (i.e., carrier) you look at their observations
since these features have non-negative values (whereas a Gaussian distribution can generate negative
values). Again, the 3 classes are determined by carrier. So, you will apply QDA based on the 3
standardized the features to the training set to train the model, and then apply the trained model
to the test set (that contains the 3 standardized the features) to classify its observations.
7.3a
First, check if the Gaussian assumption is satisfied. For this, note that if the standardized features
arr_delay, dep_delay, and distance follow a trivariate Gaussian distribution for each individual
class, then any pair among the 3 standardized features follows a bivariate Gaussian distribution for
each individual class.
14
7.3a Code
# Setting the seed for the chunk
set.seed(123)
# Inspecting the data
head(fl500Std)
## dep_delay arr_delay distance
## 1 -0.57254000 -1.2198869 -0.88996180
## 2 1.90785556 0.8324748 1.54589275
## 3 -0.51861836 -1.0584652 1.54589275
## 4 -0.43773589 -1.0123447 1.42622741
## 5 -0.08724522 -0.2052362 -0.00310864
## 6 -0.51861836 -1.0815255 1.42622741
# Collecting data
fl500StdQDA <- as.data.frame(fl500Std)
fl500StdQDA$Carrier <- fl500Labels$carrier
# Melting the data to prepare for ggplot application
fl500Melt <- melt(fl500StdQDA, id = "Carrier")
# Plotting the densities of the class variables
ggplot(fl500Melt, aes(x = value, fill = Carrier)) + geom_density(alpha = 0.25) +
theme_bw() + labs(title = "Class Distributions", x = "Value",
y = "Density")
15
0.00
0.25
0.50
0.75
1.00
1.25
−2 0 2 4 6 8
Value
Density
Carrier
AA
DL
UA
Class Distributions
# TODO: add contour plots for the three variables,
# LectureNotes5b
# https://wsu.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=cd443046-bb83-4fc6-b94d-acdb011e6a73
# Splitting data into training and test sets, 70-30 ratio
trainingIndices <- sample(1:nrow(fl500StdQDA), floor(0.7 * nrow(fl500StdQDA)),
replace = FALSE)
testIndices <- (1:nrow(fl500StdQDA))[-trainingIndices]
trainData <- fl500StdQDA[trainingIndices, ]
testData <- fl500StdQDA[testIndices, ]
# Training the model
qdaTrain <- qda(Carrier ~ ., data = trainData)
7.3a Response
Visualizing the observations from each of the three classes “AA” “DL” and “UA” the class data
(mixed variables) do not appear to exhibit a trivariate Gaussian distribution as their curves are
highly irregular, appear to be bi-modal rather than tri-modal, and also have very long right tails.
TODO Double check w/ Carl
16
7.3b
Apply the estimated (i.e., trained) QDA model to the test set, provide the estimated mixing
proportion and estimated mean vector for each class, and provide the classification table. If you
randomly pick an observation on the 3 standardized features, is it approximately equally likely to
belong to each of the 3 carriers? (You do not need to mathematically prove your answer. However,
think along this line: we are testing the equality of 3 population proportions, and each estimated
population proportion is based on around 350 observations, which approximately can be done via
a z-test since the central limit theorem is in effect.) How is the performance of QDA on this test
set? Explain your answers.
7.3b Code
# Setting the seed for the chunk
set.seed(123)
# Testing the model
qdaTest <- predict(qdaTrain, newdata = testData)
# Estimated mixing proportions
cat("Estimated Proportion - AA:", sum(fl500$carrier == "AA")/nrow(fl500),
"\n")
## Estimated Proportion - AA: 0.202
cat("Estimated Proportion - DL:", sum(fl500$carrier == "DL")/nrow(fl500),
"\n")
## Estimated Proportion - DL: 0.348
cat("Estimated Proportion - UA:", sum(fl500$carrier == "UA")/nrow(fl500),
"\n")
## Estimated Proportion - UA: 0.45
# Estimated mean vector
mv <- fl500 %% dplyr::select(carrier, dep_delay, arr_delay,
distance) %% dplyr::group_by(carrier) %% dplyr::summarise(mean_dep_delay = mean(dep_delay),
mean_arr_delay = mean(arr_delay), mean_distance = mean(distance))
mv
## # A tibble: 3 x 4
## carrier mean_dep_delay mean_arr_delay mean_distance
## * <chr <dbl <dbl <dbl
17
## 1 AA 5.63 -4.36 1434.
## 2 DL 12.2 6.10 1261.
## 3 UA 15.2 8.13 1497.
# TODO: mean vector standardized or not?
# Classification table and error
table(testData$Carrier, qdaTest$class)
##
## AA DL UA
## AA 0 22 11
## DL 1 23 25
## UA 0 37 31
cat("Classification Error:", 1 - mean(qdaTest$class == testData$Carrier))
## Classification Error: 0.64
7.3b Response
If you randomly pick an observation on the 3 standardized features, is it approximately
equally likely to belong to each of the 3 carriers? No, observations belonging to UA are about
twice as likely as that of AA. The mixing proportions for all three carriers (classes) are:
• AA: 20.2%
• DL: 34.8%
• UA: 45%
TODO: is this more about whether any given sample could be predicted to be in all three? or the
actual known sample proportions?
How is the performance of QDA on this test set? QDA Performed poorly on the test set
with an error rate of 61.33%. This is likely because the Gaussian Mixture Model assumption is
violated strongly based on the density plots of each class.
7.3c
Extract observations that are for “UA” or “DL” from the training set and the test set, respectively,
to form a new training set and a new subset, so that there are now 2 classes “UA” and “DL”.
Apply QDA to the new training set and then apply the trained model to the new test set. Report
the overall error rate on the test set, provide the ROC curve, and calculate the AUC. How is the
performance of QDA on this test set? Explain your answer.
18
7.3c Code
# Setting the seed for the chunk
set.seed(123)
# Filtering observations in classes 'UA' and 'DL' from
# previous train/test sets
trainData2 <- trainData %% dplyr::select(dep_delay, arr_delay,
distance, Carrier) %% dplyr::filter(Carrier %in% c("UA",
"DL"))
testData2 <- testData %% dplyr::select(dep_delay, arr_delay,
distance, Carrier) %% dplyr::filter(Carrier %in% c("UA",
"DL"))
# Training the QDA model
qdaTrain2 <- qda(Carrier ~ ., data = trainData2)
qdaTest2 <- predict(qdaTrain2, newdata = testData2)
# Classification table and error rate
table(testData2$Carrier, qdaTest2$class)
##
## DL UA
## DL 23 26
## UA 37 31
cat("Classification Error:", 1 - mean(qdaTest2$class == testData2$Carrier),
"\n")
## Classification Error: 0.5384615
# ROC and AUC
predQDA <- ROCR::prediction(qdaTest2$posterior[, 2], testData2$Carrier)
perfROC <- ROCR::performance(predQDA, "tpr", "fpr")
plot(perfROC, avg = "threshold", spread.estimate = "boxplot",
colorize = TRUE, main = "ROC Curve")
19
ROC Curve
Average false positive rate
Average true positive rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0 0.2 0.4 0.6 0.8 1.0
0.37 0.49 0.62 0.75 0.87 1
AUC <- predQDA %% ROCR::performance(measure = "auc") %% .@y.values
cat("AUC:", AUC[[1]], "\n")
## AUC: 0.4846939
7.3c Response
How is the performance of QDA on this test set? The error rate for this model (compared
to the last with all three carriers/classes) was slightly lower at 44.54%, but still failed to perform
adequately. Based on the ROC curve, and the AUC (0.54), the model performed only slightly
better than random guessing.
Applied exercises: II (Discriminant analysis)
8. The following is on software commands:
8.1
What is the main cause of the message “Warning in lda.default(x, grouping, . . . ): variables are
collinear”? What is the main cause of the message “Error in qda.default(x, grouping, . . . ) : some
group is too small for ‘qda’ ”?
20
8.1 Response
What is the main cause of the message “Warning in lda.default(x, grouping, . . . ):
variables are collinear”?
This error indicates that the covariance matrix is singular, e.g., some variable(s) are not linearly
independent.
What is the main cause of the message “Error in qda.default(x, grouping, . . . ) : some
group is too small for ‘qda’ ”?
This error is caused when there are not enough observations of each class to estimate corresponding
component parameters. This means that the standard estimate of component covariance matrices
will be singular.
8.2
Provide details on the list that predict{MASS} returns.
8.2 Response
The list returned by predict{MASS} contains two components: list$class contains the MAP
(maximum a posteriori) classification, and list$posterior contains the posterior probabilities for
each class in the form of an n × k matrix where k is the number of classes and n is the number of
observations.
8.3
The arguments gamma and lambda of rda{klaR} are usually determined by cross-validation. Can
they be set manually?
8.3 Response
Yes, they can be set manually by specifying gamma= and lambda= within the rda{klaR} function
call.
9. We will use the human cancer microarray data that were discussed in the lectures and are
provided by the R library ElemStatLearn (available at https://cran.r-project.org/src/contrib/
Archive/ElemStatLearn/). Pick 3 cancer types “MELANOMA”, “OVARIAN” and “RENAL”,
and randomly select the same set of 60 genes for each cancer type. Please use set.seed(123) for
the whole of this exercise. Your analysis will be based on observations for these genes and cancer
types.
# Using:
# https://cran.r-project.org/src/contrib/Archive/ElemStatLearn/ElemStatLearn_2015.6.26.tar.gz
# Setting the seed for the chunk
set.seed(123)
21
# Inspecting the raw data
nciRaw <- as.data.frame(nci)
head(nciRaw)
## CNS CNS CNS RENAL BREAST CNS CNS BREAST NSCLC
## 1 0.300 0.679961 0.940 2.800000e-01 0.485 0.310 -0.830 -0.190 0.460
## 2 1.180 1.289961 -0.040 -3.100000e-01 -0.465 -0.030 0.000 -0.870 0.000
## 3 0.550 0.169961 -0.170 6.800000e-01 0.395 -0.100 0.130 -0.450 1.150
## 4 1.140 0.379961 -0.040 -8.100000e-01 0.905 -0.460 -1.630 0.080 -1.400
## 5 -0.265 0.464961 -0.605 6.250000e-01 0.200 -0.205 0.075 0.005 -0.005
## 6 -0.070 0.579961 0.000 -1.387779e-17 -0.005 -0.540 -0.360 0.350 -0.700
## NSCLC RENAL RENAL RENAL RENAL RENAL RENAL RENAL BREAST NSCLC RENAL
## 1 0.760 0.270 -0.450 -0.030 0.710 -0.360 -0.210 -0.500 -1.060 0.150 -0.290
## 2 1.490 0.630 -0.060 -1.120 0.000 -1.420 -1.950 -0.520 -2.190 -0.450 0.000
## 3 0.280 -0.360 0.150 -0.050 0.160 -0.030 -0.700 -0.660 -0.130 -0.320 0.050
## 4 0.100 -1.040 -0.610 0.000 -0.770 -2.280 -1.650 -2.610 0.000 -1.610 0.730
## 5 -0.525 0.015 -0.395 -0.285 0.045 0.135 -0.075 0.225 -0.485 -0.095 0.385
## 6 0.360 -0.040 0.150 -0.250 -0.160 -0.320 0.060 -0.050 -0.430 -0.080 0.390
## UNKNOWN OVARIAN MELANOMA PROSTATE OVARIAN OVARIAN OVARIAN OVARIAN OVARIAN
## 1 -0.200 0.430 -0.490 -0.530 -0.010 0.640 -0.480 0.140 0.640
## 2 0.740 0.500 0.330 -0.050 -0.370 0.550 0.970 0.720 0.150
## 3 0.080 -0.730 0.010 -0.230 -0.160 -0.540 0.300 -0.240 -0.170
## 4 0.760 0.600 -1.660 0.170 0.930 -1.780 0.470 0.000 0.550
## 5 -0.105 -0.635 -0.185 0.825 0.395 0.315 0.425 1.715 -0.205
## 6 -0.080 -0.430 -0.140 0.010 -0.100 0.810 0.020 0.260 0.290
## PROSTATE NSCLC NSCLC NSCLC LEUKEMIA K562B-repro K562A-repro LEUKEMIA
## 1 0.070 0.130 0.320 0.515 0.080 0.410 -0.200 -0.36998050
## 2 0.290 2.240 0.280 1.045 0.120 0.000 0.000 -1.38998000
## 3 0.070 0.640 0.360 0.000 0.060 0.210 0.060 -0.05998047
## 4 1.310 0.680 -1.880 0.000 0.400 0.180 -0.070 0.07001953
## 5 0.085 0.135 0.475 0.330 0.105 -0.255 -0.415 -0.07498047
## 6 -0.620 0.300 0.110 -0.155 -0.190 -0.110 0.020 0.04001953
## LEUKEMIA LEUKEMIA LEUKEMIA LEUKEMIA COLON COLON COLON
## 1 -0.370 -0.430 -0.380 -0.550 -0.32003900 -0.620 -4.900000e-01
## 2 0.180 -0.590 -0.550 0.000 0.08996101 0.080 4.200000e-01
## 3 0.000 -0.500 -1.710 0.100 -0.29003900 0.140 -3.400000e-01
## 4 -1.320 -1.520 -1.870 -2.390 -1.03003900 0.740 7.000000e-02
## 5 -0.825 -0.785 -0.585 -0.215 0.09496101 0.205 -2.050000e-01
## 6 -0.130 0.520 0.120 -0.620 0.05996101 0.000 -1.387779e-17
## COLON COLON COLON COLON MCF7A-repro BREAST MCF7D-repro
## 1 0.07001953 -0.120 -0.290 -0.8100195 0.200 0.37998050 0.3100195
## 2 -0.82998050 0.000 0.030 0.0000000 -0.230 0.44998050 0.4800195
## 3 -0.59998050 -0.010 -0.310 0.2199805 0.360 0.65998050 0.9600195
## 4 -0.90998050 0.130 1.500 0.7399805 0.180 0.76998050 0.9600195
## 5 0.24501950 0.555 0.005 0.1149805 -0.315 0.05498047 -0.2149805
## 6 -0.43998050 -0.550 -0.540 0.1199805 0.410 0.54998050 0.3700195
## BREAST NSCLC NSCLC NSCLC MELANOMA BREAST BREAST MELANOMA
22
## 1 0.030 -0.42998050 0.160 0.010 -0.620 -0.380 0.04998047 0.650
## 2 0.220 -0.38998050 -0.340 -1.280 -0.130 0.000 -0.72001950 0.640
## 3 0.150 -0.17998050 -0.020 -0.770 0.200 -0.060 0.41998050 0.150
## 4 -1.240 0.86001950 -1.730 0.940 -1.410 0.800 0.92998050 -1.970
## 5 -0.305 0.78501950 -0.625 -0.015 1.585 -0.115 -0.09501953 -0.065
## 6 0.050 0.04001953 -0.140 0.270 1.160 0.180 0.19998050 0.130
## MELANOMA MELANOMA MELANOMA MELANOMA MELANOMA
## 1 -0.030 -0.270 0.210 -5.000000e-02 0.350
## 2 -0.480 0.630 -0.620 1.400000e-01 -0.270
## 3 0.070 -0.100 -0.150 -9.000000e-02 0.020
## 4 -0.700 1.100 -1.330 -1.260000e+00 -1.230
## 5 -0.195 1.045 0.045 4.500000e-02 -0.715
## 6 0.410 0.080 -0.400 -2.710505e-20 -0.340
# Extracting data based on required cancer types
n0 <- dim(nci)[2]
p0 <- dim(nci)[1]
rSel <- sample(1:p0, size = 60, replace = FALSE)
chk <- colnames(nci) %in% c("MELANOMA", "OVARIAN", "RENAL")
cSel <- which(chk == TRUE)
ncia <- nci[rSel, cSel]
colnames(ncia) <- colnames(nci)[cSel]
# Creating the dataframe for storing the required information
# with labels in the correct format
nciData <- data.frame(t(ncia))
nciData$Class <- colnames(ncia)
rownames(nciData) <- NULL
dim(nciData)
## [1] 23 61
head(nciData)
## X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
## 1 -1.030000e+00 0.08 -0.06 0.8500195 -0.10 0.00 -0.23 -0.55 0.00 0.11
## 2 4.900000e-01 -0.30 -0.42 0.7200195 1.63 -0.80 0.08 -0.16 0.00 -0.83
## 3 -1.380000e+00 -0.94 -0.68 -0.2299805 -0.20 0.51 0.14 0.60 0.00 -0.18
## 4 -8.605855e-19 0.30 -0.35 0.1300195 0.85 0.13 -0.69 0.38 -0.17 -0.55
## 5 1.420000e+00 -0.50 -0.26 0.2900195 0.89 -0.06 -0.76 0.15 0.17 0.51
## 6 2.000000e-01 -0.27 0.03 -0.2299805 0.46 -0.53 -0.62 0.40 0.00 1.54
## X11 X12 X13 X14 X15 X16 X17
## 1 -0.669961 -0.1449902 0.00 -0.329990200 0.2600195 -0.40 -0.7300195
## 2 0.480039 0.4050098 0.68 -0.459990200 0.7700195 -0.16 -0.6200195
## 3 1.320039 -0.2649902 0.09 -0.009990234 2.8000190 0.10 0.1499805
23
## 4 -0.249961 -0.1649902 0.63 -0.249990200 -0.2499805 -0.01 -0.2000195
## 5 0.570039 -0.4449902 0.41 -0.189990200 0.4900195 -0.35 -0.4000195
## 6 0.130039 -0.4449902 0.66 -0.769990200 0.6500195 -0.62 -0.1000195
## X18 X19 X20 X21 X22 X23 X24 X25 X26 X27
## 1 -0.30498050 -0.46 -0.49 -0.5999902 0.24 0.255 -0.28 0.60 0.98500980 -0.49
## 2 -0.34498050 0.24 -0.26 -0.3099902 0.72 0.235 -0.41 -0.07 -0.36499020 -0.29
## 3 0.08501949 -0.67 0.22 0.7800098 0.12 -0.875 -0.37 0.04 1.42501000 0.68
## 4 -0.09498051 0.23 0.26 0.1200098 -0.21 -0.145 -0.43 -0.02 0.15500980 0.52
## 5 -0.27498050 1.29 0.13 -0.1899902 -0.06 0.755 -0.54 -0.36 0.01500977 0.37
## 6 -0.67498050 1.68 -0.36 0.5700098 -0.19 0.335 0.00 0.23 0.45500980 0.03
## X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38
## 1 -1.255 -0.025 -0.61 0.80 0.68 -0.96 2.215 0.175 0.16 0.68 0.68001950
## 2 -0.195 -0.665 -0.03 0.10 -0.50 -0.09 1.255 -0.165 0.07 -0.10 -1.15998100
## 3 -0.345 0.255 -0.01 -0.40 0.00 0.13 1.765 -0.055 0.39 -0.18 0.76001950
## 4 0.000 -0.455 -0.50 -0.62 -0.68 -0.24 -1.085 0.315 -0.24 0.18 -0.55998050
## 5 -0.635 -0.075 -0.12 -0.73 -0.50 0.17 -0.415 0.055 -0.17 -0.01 0.05001949
## 6 -0.415 0.665 0.44 0.00 -1.04 0.09 -0.535 0.045 0.62 -0.29 -0.49998050
## X39 X40 X41 X42 X43 X44 X45 X46 X47 X48
## 1 -0.935 -0.30 -0.16 -0.16 0.105 1.18003900 4.500000e-01 -0.0925 -0.505 0.83
## 2 -0.505 -0.31 -0.11 0.10 1.955 1.30003900 -4.700000e-01 0.3375 -0.465 0.10
## 3 0.475 0.15 -0.67 -1.04 0.475 -0.77996100 -1.214306e-17 -0.6525 -0.045 -0.36
## 4 0.335 -0.15 -0.41 -0.32 1.455 -0.63996100 3.200000e-01 0.0075 0.765 -0.20
## 5 0.165 -0.48 -0.04 -0.24 0.795 -0.08996101 -2.100000e-01 0.1075 -0.715 0.03
## 6 -0.265 0.16 -0.10 -1.08 1.375 0.93003900 -1.500000e-01 0.0875 0.015 -0.55
## X49 X50 X51 X52 X53 X54 X55 X56 X57
## 1 0.1649902 -0.4450098 0.385 0.04001949 0.18 -0.51 0.28 -0.23 0.1300195
## 2 0.0000000 0.2249902 0.305 -0.78998050 -0.72 0.18 -0.15 -0.69 -0.1799805
## 3 1.6349900 -0.1150098 -0.475 -0.40998050 0.24 -0.22 -0.08 -0.03 -0.4099805
## 4 2.4049900 0.3149902 0.485 -0.24998050 -0.30 -0.01 -0.17 -0.63 0.3200195
## 5 1.6749900 0.5649902 0.245 -0.04998051 -0.22 0.13 -0.25 -0.03 -0.2299805
## 6 2.7449900 0.8549902 0.595 -0.02998051 -0.11 0.05 -0.48 0.33 0.4600195
## X58 X59 X60 Class
## 1 0.66 -0.03 3.50 RENAL
## 2 0.02 -0.04 0.43 RENAL
## 3 0.04 0.00 -0.85 RENAL
## 4 -0.02 -0.17 -0.12 RENAL
## 5 0.10 0.65 -1.00 RENAL
## 6 0.45 0.28 0.00 RENAL
9.1
Pick 2 features and visualize the observations using the 2 features. Do you think it is hard to classify
the observations based on the amount of overlap among the 3 neighborhoods of observations in each
of the 3 classes? Here “a neighborhood of observations in a class” is a “open disk that contains the
observations in the class”.
24
9.1 Code
# Setting the seed for the chunk
set.seed(123)
# Preparing the data for visualization
nciSub <- nciData %% dplyr::select(X1, X2, Class)
# Plotting the data with two features (X1, X2)
ggplot(nciSub, aes(X1, X2)) + geom_point(aes(color = Class)) +
theme_bw() + ggtitle("Data by Class for Features X1, X2")
−1.0
−0.5
0.0
0.5
1.0
−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5
X1
X2
Class
MELANOMA
OVARIAN
RENAL
Data by Class for Features X1, X2
9.1 Response
Do you think it is hard to classify the observations based on the amount of overlap
among the 3 neighborhoods of observations in each of the 3 classes? It does appear that
the classes will be difficult to separate given the overlap. There is some visual separation as blue
seems to group towards the bottom left, red towards the bottom right, and green towards the top
center. Having said this, these data are visualized in the plot in a 2D space, whereas the actual
data exist in a 60D space. Therefore, it is difficult to say with certainty that it will be impossible
to separate the classes in their ambient space.
25
9.2
Apply LDA and report the classwise error rate for each cancer type.
9.2 Code
# Setting the seed for the chunk
set.seed(123)
# Training the LDA model
lda.fit <- lda(Class ~ ., data = nciData)
## Warning in lda.default(x, grouping, ...): variables are collinear
# Testing the model
lda.pred <- predict(lda.fit, nciData)
TrueClassLabel <- nciData$Class
LDAEstimatedClassLabel <- lda.pred$class
# Generating 2x2 prediction table and computing class-wise
# error rates
predTable <- table(LDAEstimatedClassLabel, TrueClassLabel)
predTable
## TrueClassLabel
## LDAEstimatedClassLabel MELANOMA OVARIAN RENAL
## MELANOMA 6 0 0
## OVARIAN 0 5 0
## RENAL 2 1 9
cwDF <- as.matrix.data.frame(predTable)
correct <- diag(cwDF)
total <- colSums(cwDF)
cwError <- 1 - (correct/total)
cat("Classwise Error:", cwError, "\n")
## Classwise Error: 0.25 0.1666667 0
9.3
Use the library klaR, and apply regularized discriminant analysis (RDA) by setting the arguments
gamma and lambda of rda{klaR} manually so that the resulting classwise error rate for each cancer
type is zero.
26
9.3 Code
# Setting the seed for the chunk
set.seed(123)
# Training the RDA model
rda.fit <- rda(Class ~ ., data = nciData, gamma = 0.825, lambda = 0.37)
# Testing the model
rda.pred <- predict(rda.fit, nciData)
# Collecting performance metrics
TrueClassLabel <- nciData$Class
rDAEstimatedClassLabel <- rda.pred$class
# Computing and displaying performance metrics
predTable <- table(rDAEstimatedClassLabel, TrueClassLabel)
predTable
## TrueClassLabel
## rDAEstimatedClassLabel MELANOMA OVARIAN RENAL
## MELANOMA 8 0 0
## OVARIAN 0 6 0
## RENAL 0 0 9
cwDF <- as.matrix.data.frame(predTable)
correct <- diag(cwDF)
total <- colSums(cwDF)
cwError <- 1 - (correct/total)
cat("Classwise Error:", cwError, "\n")
## Classwise Error: 0 0 0
9.4
Obtain the estimated covariance matrices from the RDA and visualize them using the same strategy in Example 3 in “LectureNotes5c_notes.pdf”. What can you say about the degree of dependence among these genes for each of the three cancer types? (Hint and caution: the class labels
“MELANOMA”, “OVARIAN” and “RENAL” will be ordered alphabetically by R. So, you need to
keep track on which estimated covariance matrix is for which class. Otherwise, you will get wrong
visualization.)
27
9.4 Code
# Setting the seed for the chunk
set.seed(123)
# Extracting covariance matrices
hatSigma1 <- rda.fit$covariances[, , 1]
hatSigma2 <- rda.fit$covariances[, , 2]
hatSigma3 <- rda.fit$covariances[, , 3]
# Melting covariance matrices and prepping data for
# visualization
melted_hatSigma1 <- melt(hatSigma1)
melted_hatSigma2 <- melt(hatSigma2)
melted_hatSigma3 <- melt(hatSigma3)
EstSigmaAll <- rbind(melted_hatSigma1, melted_hatSigma2, melted_hatSigma3)
EstSigmaAll$Cancer <- rep(c("MELANOMA", "OVARIAN", "RENAL"),
each = nrow(melted_hatSigma1))
EstSigmaAll$Cancer <- factor(EstSigmaAll$Cancer)
# Plotting the data
ggplot(EstSigmaAll, aes(Var1, Var2, fill = value)) + geom_tile() +
scale_fill_gradient2(low = "blue", high = "red") + facet_grid(~Cancer) +
labs(title = "Estimated Covariance Matrices", x = "", y = "") +
theme(plot.title = element_text(hjust = 0.5))
28
MELANOMA OVARIAN RENAL
X1X2X3X4X5X6X7X8X109X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60X1X2X3X4X5X6X7X8X109X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60X1X2X3X4X5X6X7X8X109X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60
X1X2X3X4X5X6X7X8X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60
−2
−1
0
1
2
value
Estimated Covariance Matrices
9.4 Response
What can you say about the degree of dependence among these genes for each of the
three cancer types? RENAL cancer expresses the most dependence between genes, with OVARIAN
cancer expressing the least dependence.
TODO enough?
29

More products