Starting from:

$35

Homework #3- Implement a fixed-depth decision tree algorithm

CS6375: Machine Learning
Problem: Implement a fixed-depth decision tree algorithm, that is, the input to the ID3 algorithm will include
the training data and maximum depth of the tree to be learned. The code skeleton as well as data sets for this
assignment can be found on e-Learning.
Data Sets: The MONK’s Problems were the basis of a first international comparison of learning algorithms1
.
The training and test files for the three problems are named monks-X.train and monks-X.test. There are six
attributes/features (columns 2–7), and binary class labels (column 1). See monks.names for more details.
Visualization: The code skeleton provided contains a function render dot file(), which can be used to generate
.png images of the trees learned by both scikit-learn and your code. See the documentation for render dot file()
for additional details on usage.
a. (Autograder Score, 20 points) Your code will be auto-graded and cross-checked with other submissions. The
autograder will evaluate your code on several different data sets to perform a sanity check. In order to ensure
that your code passes the autograder, ensure that you do not modify the function headers. In addition,
do not hard code any values (such as y = 0 and 1) and make your code as general as possible.
b. (Learning Curves, 20 points) For depth = 1, . . . , 10, learn decision trees and compute the average training
and test errors on each of the three MONK’s problems. Make three plots, one for each of the MONK’s
problem sets, plotting training and testing error curves together for each problem, with tree depth on the
x-axis and error on the y-axis.
c. (Weak Learners, 20 points) For monks-1, report the visualized learned decision tree and the confusion
matrix on the test set for depth = 1, 3, 5. You may use scikit-learns’s confusion matrix() function2
.
d. (scikit-learn, 20 points) For monks-1, use scikit-learn’s DecisionTreeClassifier3
to learn a decision
tree using criterion=’entropy’ for depth = 1, 3, 5. You may use scikit-learn’s confusion matrix()
function4
.
e. (Other Data Sets, 20 points) Repeat steps (c) and (d) with your “own” data set and report the confusion
matrices. You can use other data sets in the UCI repository. If you encounter continuous features, consider
a simple discretization strategy to pre-process them into binary features using the mean. For example, a
continuous feature x can be discretized using its mean µ as
xbinary =
?
0, if x ≤ µ,
1, if x µ.
Write a report with the solutions to the questions above, showing the plots, confusion matrices and a brief discussion (4–5 lines) comparing your implementation to that of scikit-learn, which is a widely-used, publicly-available,
open-source implementation.
1https://archive.ics.uci.edu/ml/datasets/MONK’s+Problems
2https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
3http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html
4https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html
Homework # 3 1 Due: via e-Learning, March 27 (Wednesday)

More products