$29
ENGR 421 DASC 521
Homework 05: Decision Tree Regression
In this homework, you will implement a decision tree regression algorithm in R, Matlab, or
Python. Here are the steps you need to follow:
1. You are given a univariate regression data set, which contains 272 data points about the
duration of the eruption and waiting time between eruptions for the Old Faithful geyser in
Yellowstone National Park, Wyoming, USA (https://www.yellowstonepark.com/thingsto-do/about-old-faithful), in the file named hw05_data_set.csv.
2. Divide the data set into two parts by assigning the first 150 data points to the training set
and the remaining 122 data points to the test set.
3. Implement a decision tree regression algorithm using the following pre-pruning rule: If a
node has � or fewer data points, convert this node into a terminal node and do not split
further, where � is a user-defined parameter.
4. Learn a decision tree by setting the pre-pruning parameter � to 25. Draw training data
points, test data points, and your fit in the same figure. Your figure should be similar to
the following figure.
5. Calculate the root mean squared error for test data points. The formula for RMSE can be
written as
RMSE = '∑ (*+,*-+) 01231 / +45
61231
.
Your output should be similar to the following sentence.
2 3 4 5
50
60
70
80
90
P = 25
Eruption time (min)
Waiting time to next eruption (min)
● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●● ●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
training
test
RMSE is 6.4541 when P is 25
6. Learn decision trees by setting the pre-pruning parameter � to 5, 10, 15, …, 50. Draw
RMSE for test data points as a function of �. Your figure should be similar to the
following figure.
What to submit: You need to submit your source code in a single file (.R file if you are using R,
.m file if you are using Matlab, or .py file if you are using Python) and a short report explaining
your approach (.doc, .docx, or .pdf file). You will put these two files in a single zip file named as
STUDENTID.zip, where STUDENTID should be replaced with your 7-digit student number.
How to submit: Submit the zip file you created to Blackboard. Please follow the exact style
mentioned and do not send a zip file named as STUDENTID.zip. Submissions that do not follow
these guidelines will not be graded.
Late submission policy: Late submissions will not be graded.
Cheating policy: Very similar submissions will not be graded.
●
●
● ●
● ●
●
●
● ●
10 20 30 40 50
6.5
7.0
7.5
Pre−pruning size (P)
RMSE