$30
CECS 451
Assignment 12
Total: 34 Points
General Instruction
• Submit uncompressed file(s) in the Dropbox folder via BeachBoard (Not email).
1. Using scikit learn, evaluate the classification accuracy of the decision tree, bagging,
AdaBoost, and Random forest.
(a) Load the Breast cancer data using sklearn.datasets.load breast cancer.
(b) (2 points) Print out the names of the features (X) and the name of the target (y).
(c) (2 points) Allocate the half of the data to Train (X train, y train) and the remaining half to Test (X test, y test).
(d) The common goal of the classifiers is predicting target using features.
(e) The classifiers should be trained using Train set and be tested using Test set.
(f) Use the ‘Gini’ index as the criterion and fix the maximum depth of trees as 2.
(g) (5 points) Write a program that generates a decision tree from X train, y train
and predict y pred from X test. You can compute accuracy of the classifier by
comparing y pred and y test. Please print out the accuracy.
(h) (5 points) Visualize the tree using sklearn.tree.plot tree. Each node of trees
should include feature name.
(i) (5 points) Similarly, write a program that generates multiple decision trees using
the bagging. This method should record its prediction accuracy at bagging score
by varying the parameter n estimators. Draw a 2D line plot whose X-axis is
n estimators and Y-axis bagging score, and the plot should have more than 20
data points of different X-axis values.
(j) (5 points) Similarly, write a program that generates multiple decision trees using
the AdaBoost. Draw a 2D line plot whose X-axis is n estimators and Y-axis
boost score, and the plot should have more than 20 data points of different X-axis
values.
(k) (10 points) Similarly, write a program that generates multiple decision trees using
the random forest. Draw a 3D surface plot whose X-axis is n estimators, Y-axis
max features, and Z-axis forest score. The plot should have more than 100 data
points of different pair of X-axis and Y-axis values.
(l) Submit your Assn12.ipynb which includes all the plots.