$29
In!this!assignment,!you!will!explore!decision!trees.!A!binary!decision!tree!is!a!full!binary!tree!to!identify! the!class!to!which!observed!data!belongs.!The!following!example!illustrates!this!concept.!
!
This!example!shows!a!decision!tree!to!predict!whether!a!given!person!wears!glasses!or!not!by!simply! using!their!age!and!gender.!Obviously,!using!these!two!pieces!of!information!is!very!limiting!and! certainly!do!not!allow!us!to!come!to!a!reliable!prediction.!The!percentages!shown!are!an!estimate!of!the! accuracy!of!these!predictions.!They!have!been!obtained!based!on!a!large!number!of!samples!(for!which! age,!gender!and!presence!of!glasses!were!known).!By!applying!the!rules!of!the!decision!tree!on!a!large! number!of!observations,!it!is!possible!to!obtain!a!good!estimate!of!the!accuracy!of!each!of!the! classifications!obtained!in!the!leaves!of!the!tree.!Indeed,!with!these!preKclassified!samples,!it!is!possible! to!count!how!many!samples!of!each!class!reach!a!given!leaf.!For!example,!for!the!leaf!corresponding!to! the!decision![Age!<45,!Female,!Age!<24],!it!was!observed!that!in!62%!of!cases!the!sample!reaching!this! leaf!was!a!person!without!glasses!and!in!38%!of!the!cases!it!was!a!person!without!glasses.!So!if!an! observation!to!be!classified!enters!the!root!and!ends!on!this!leaf,!the!class!assigned!will!be!'without! glasses'!since!there!were!a!higher!percentage!of!samples!(62%)!belonging!to!this!class!that!arrived!at!this! leaf.!Obviously,!a!classification!associated!with!a!high!percentage!(say!97%)!is!far!more!reliable!than!a! classification!associated!with!a!lower!percentage,!such!as!62%.!
To!perform!the!classification,!the!decision!tree!uses!input!vectors!that!have!a!certain!size!(2!in!our! example,!age!and!sex).!The!decision!is!to!determine!the!membership!class;!in!our!example,!there!are!2! classes!(with!glasses!or!without!glasses),!but!in!general!there!may!be!more.!
The!decision!tree!consists!of!nodes!(stumps)!that!use!one!of!the!entries!of!the!decision!vector!and! applies!a!threshold.!Depending!on!the!result!of!the!comparison,!we!branch!left!or!right.!A!vector!enters! the!tree!by!the!root.!It!follows!a!series!of!comparisons!leading!to!a!leaf!of!the!tree.! The!file!DecisionStump.java!provides!an!implementation!for!a!decision!node.!Each!node!specifies! an!index!(featureIndex)!!in!the!vector!and!a!threshold!(threshold)!to!be!used!in!the!comparison.! The!leaves!of!the!tree!are!dummy!nodes!without!comparison,!which!return!the!class!with!the!highest! probability!for!that!node.!These!probabilities!are!obtained!either!by!specifying!them!directly!or!by! accumulating!the!results!obtained!by!using!preKclassified!samples!(classFreq!and!total).!! The!method!main!of!this!class!builds!the!decision!tree!shown!in!the!previous!page,!and!tests!its!results! with!different!inputs.!!
!
Part%A%(marks%50%)% !Using!the!DecisionStump!class,!complete!the!DecisionTree!class!by!adding!the!following! operations!:!
public void replace(DecisionStump leaf, int featureIndex, double threshold)!!:! that!replaces!a!given!leaf!of!the!decision!tree!by!a!new!stump!that!applies!a!decision!threshold!on!one! element!of!a!feature!vector.!This!operation!also!creates!also!dummy!leaves.!This!operation!will!be!used! to!grow!the!decision!tree.!
public DecisionStump getSmallestMaxProb():!that!returns!the!leaf!having!the!smallest! maximum!probability!(i.e.!the!least!reliable!node).!Look!at!the!getMaxClass!and!getMaxProb!methods!of! DecisionStump.!This!operation!will!be!used!to!replace!an!unreliable!leaf!by!a!new!decision!stump.!
public void print()!:!that!prints!all!the!nodes!of!a!decision!tree!by!calling!the!toString!of!the! DecisionStump!instances!in!a!preKorder!traversal.!
public int getDecision(double[] vector):!that!returns!the!class!associated!with!a!sample!of! unknown!class.!This!operation!will!return!the!class!number!having!the!highest!probability.!
%
Part%B%(marks%25%)% File!iris.data.txt!contains!a!list!of!samples!specifying!the!length!and!width!of!the!petals!and!setals! of!3!different!species!of!iris.!This!is!then!a!file!made!of!features!vector!of!dimension!4!with!3!possible! classes.!
! 1. Generate!a!decision!tree!made!of!one!stump!that!uses!element!at!index!0!with!a!threshold!of!5.0.! // Create 1-node tree DecisionTree dt= new DecisionTree(6, 3); dt.replace(dt.getRoot(), 0, 5.0); 2. Call!the!train!method!of!DecisionTree!with!each!preKclassified!samples!in!the!given!file.! 3. Print!all!the!nodes!and!their!associated!probabilities.! 4. Find!the!leaf!having!the!smallest!maximal!probability.!Replace!this!one!with!a!stump!on!index!2!with! threshold!2.5.!Train!this!new!decision!tree!with!the!same!dataset!:! // train a 2-node decision tree dt.replace(dt.getSmallestMaxProb(), 2, 2.5); dt.resetAll(); // add training code… 5. Print!all!the!nodes!and!their!associated!probabilities.! 6. Add!another!node!and!reKtrain!the!tree!as!follows!:! // train a 3-node decision tree dt.replace(dt.getSmallestMaxProb(), 1, 3.0); dt.resetAll(); // add training code…! Part%C%(marks%25%)%
We!now!ask!you!to!generalize!the!process!described!in!Part!B.!
1. Create!a!1Knode!decision!tree!by!adding!a!stump!with!randomly!chosen!index!and!threshold.! 2. Train!this!tree!with!the!given!dataset!of!preKclassified!iris!samples.! 3. Find!the!leaf!having!the!smallest!maximal!probability!and!replace!it!with!a!new!decision!stump! with!randomly!chosen!index!and!threshold.! 4. Repeat!steps!2!and!3!up!to!50!times.! 5. Do!you!see!an!improvement!in!the!performance!of!the!classifier!?!Justify.!! Print!!the!answer!to!this!question!from!your!code;!you!can!print!relevant!information!about!the! decision!trees!to!justify!your!answer.!
WHAT%TO%HAND%IN%:%
• Your!program:!DecisionTree.java!(with!required!modifications),!DecisionStump.java!(no!need!to! modify)! • README.txt!file!(optional,!explain!any!deviations!(e.g.!extra!classes!used)!and!any!known!bugs).! • Printout.txt:!output!of!your!program!showing!results!for!part!B!and!C;!this!is!expected!to!be! identical!to!what!we!would!obtain!if!we!run!the!program!you!submit.! • Put!the!files!specified!above!in!a!folder!called!u9999999!(where!9999999!is!your!student! number);!zip!the!folder!and!submit!the!zipped!folder! DEVIATION%TO%FOLLOW%THESE%STANDARDS%WILL%GET%%5A10%%%MARKS%OFF.%