$35
Page 1 of 2
COSC 4570/5010 Data Mining
Homework #2
Submission guideline You need to submit only one .zip file. Please name the file as “Your Net
id_Homework2.zip”.
1. Problems from the book (Introduction to Data Mining 2nd Edition by
Tan, Steinbach et al.)
Solve the following:
Chapter 3: Problems 1, 5, 7, and 10.
Chapter 4: Problem 6
OR
Problems from the book (Introduction to Data Mining 1st Edition by
Tan, Steinbach et al.)
Solve the following:
Chapter 4: Problems 1, 5, 6, and 9.
Chapter 5: Problem 6
2. Decision Tree Learning
• What does zero entropy mean?
• What is maximum value for the entropy of a random variable that can take n
values? justify.
• What kind of real attributes create problems for entropy-based decision trees. How
can we solve this problem?
Page 2 of 2
• Describe pre-pruning and post-pruning techniques for dealing with decision tree
overfitting.
• Is the Gini gain (Gini of the parent subtracted by the Gini of the split) always
positive? What about entropy's gain? What if you use classifications error? Prove or
provide counterexamples.
3. Naive Bayes Classifier
• What is the time complexity for learning a Naive Bayes Classifier?
• What is the time complexity for classifying using the Naive Bayes Classifier?