Assignment4
4.1 Maximumlikelihoodestimationofamultinomialdistribution
A 2D-sided die is tossed many times, and the results of each toss are recorded as data. Suppose that in the courseoftheexperiment,the dth sideofthedieisobserved Cd times. Forthisproblem,youshouldassume thatthetossesareidentically,independentdistributed(i.i.d.) accordingtotheprobabilitiesofthedie.
(a) Log-likelihood Let X 2{ 1,2,3,...,2D}denote the outcome of a toss, and let pd = P(X =d) denote the probabilitiesofthedie. Expressthelog-likelihoodL= logP(data) oftheobservedresultsintermsofthe probabilities pd andthecounts Cd. (b) Maximumlikelihoodestimate Derive the maximum likelihood estimates of the die’s probabilities pd. Specifically, maximize your expressionforthelog-likelihoodLinpart(a)subjecttotheconstraints 2D X d=1 pd =1,p d 0. You should use a Lagrange multiplier to enforce the linear equality constraint, but it is sufficient to observethattheresultingsolutionisnonnegative. (c) Evenversusodd Compute the probability P(X 2{ 2,4,6,...,2D}) that the roll of a die is even and also the proba-bility P(X 2{ 1,3,5,...,2D 1}) that the roll of a die is odd. Show that these two probabilitiesareequalwhen 2D X d=1 (1)dpd =0 . (d) Maximumlikelihoodestimate Supposeitisknownapriorithattheprobabilityofaneventossisequaltothatofanoddtoss. Derive the maximum likelihood estimates of the die’s probabilities pd subject to this additional constraint. Specifically,maximizeyourexpressionforthelog-likelihoodLinpart(a)subjecttotheconstraints 2D X d=1 pd =1 , 2D X d=1 (1)dpd =0,p d 0. Hint: introducetwoLagrangemultipliers,oneforeachlinearequalityconstraint.
1
4.2 Maximumlikelihoodestimationinbeliefnetworks Consider the two DAGs shown below, G1 and G2, over the same nodes{X1,X2,...,Xn}, that differ only inthedirectionoftheiredges.
Xn-2 XnX 2 X3X 1 Xn-1 • • •
Xn-2 XnX 2 X3X 1 Xn-1 • • • G2
G1
Suppose that we have a (fully observed) data set{x(t) 1 ,x (t) 2 ,...,x (t) n }T t=1 in which each example provides a complete instantiation of the nodes in these DAGs. Let COUNTn(x) denote the number of examples in which Xn=x,andlet COUNTn(x,x0) denotethenumberofexamplesinwhich Xn=x and Xn+1=x0.
(a) ExpressthemaximumlikelihoodestimatesfortheCPTsin G1 intermsofthesecounts. (b) ExpressthemaximumlikelihoodestimatesfortheCPTsin G2 intermsofthesecounts. (c) Using your answers from parts (a) and (b), show that the maximum likelihood CPTs for G1 and G2 fromthisdatasetgiverisetothesamejointdistributionoverthenodes{X1,X2,...,Xn}. (d) Suppose that some but not all of the edges in these DAGs were reversed, as in the graph G3 shown below. Would the maximum likelihood CPTs for G3 also give rise to the same joint distribution? (Hint: does G3 implyallthesamestatementsofconditionalindependenceas G1 and G2?)
Xn-2 Xn
X2 X3X 1
Xn-1
• • •
G3
X4 Xn-3
2
4.3 Statisticallanguagemodeling
In this problem, you will explore some simple statistical models of English text. Download and examine thedatafilesonPiazzaforthisassignment. (Startwiththe readme.txt file.) Thesefilescontainunigram andbigramcountsfor500frequentlyoccurringtokensinEnglishtext. Thesetokensincludeactualwordsas wellaspunctuationsymbolsandothertextualmarkers. Inaddition,an“unknown”tokenisusedtorepresent all words that occur outside this basic vocabulary. Turn in a printed copy of all your source code and resultsforthefollowingproblems. Youmayprograminthelanguageofyourchoice.
(a) ComputethemaximumlikelihoodestimateoftheunigramdistributionPu(w)overwordsw. Printout atableofallthetokens(i.e.,words)thatstartwiththeletter“M”,alongwiththeirnumericalunigram probabilities(notcounts). (Youdonotneedtoprintouttheunigramprobabilitiesforall500tokens.) (b) Compute the maximum likelihood estimate of the bigram distribution Pb(w0|w). Print out a table of thetenmostlikelywordstofollowtheword“THE”,alongwiththeirnumericalbigramprobabilities. (c) Considerthesentence“Thestockmarketfellbyonehundredpointslastweek.”Ignoringpunctuation,computeandcomparethelog-likelihoodsofthissentenceundertheunigramandbigrammodels: Lu = loghPu(the)Pu(stock)Pu(market) ...Pu(points)Pu(last)Pu(week)i Lb = loghPb(the|hsi)Pb(stock|the)Pb(market|stock) ...P b(last|points)Pb(week|last)i Intheequationforthebigramlog-likelihood,thetokenhsiisusedtomarkthebeginningofasentence. Whichmodelyieldsthehighestlog-likelihood? (d) Consider the sentence “The sixteen officials sold fire insurance.” Ignoring punctuation, compute andcomparethelog-likelihoodsofthissentenceundertheunigramandbigrammodels: Lu = loghPu(the)Pu(sixteen)Pu(officials) ...Pu(sold)Pu(fire)Pu(insurance)i Lb = loghPb(the|hsi)Pb(sixteen|the)Pb(officials|sixteen) ...P b(fire|sold)Pb(insurance|fire)i Which pairs of adjacent words in this sentence are not observed in the training corpus? What effect doesthishaveonthelog-likelihoodfromthebigrammodel? (e) Considertheso-calledmixturemodelthatpredictswordsfromaweightedinterpolationoftheunigram andbigrammodels: Pm(w0|w)=Pu(w0) + (1)Pb(w0|w), where 2 [0,1] determines how much weight is attached to each prediction. Under this mixture model,thelog-likelihoodofthesentencefrompart(d)isgivenby: Lm = loghPm(the|hsi)Pm(sixteen|the)Pm(officials|sixteen) ...Pm(fire|sold)Pm(insurance|fire)i. Computeandplotthevalueof thislog-likelihoodLm asafunctionofthe parameter 2[0,1]. From yourresults,deducetheoptimalvalueof totwosignificantdigits.
3
4.4 Stockmarketprediction
In this problem, you will apply a simple linear model to predicting the stock market. From the course web site,downloadthefilesnasdaq00.txtandnasdaq01.txt,whichcontaintheNASDAQindicesatthe closeofbusinessdaysin2000and2001.
2000 20011K
2K
3K
4K
5K
6K
year
price
NASDAQ
TRAIN TEST
(a) Linearcoefficients Howaccuratelycantheindexononedaybepredictedbyalinearcombinationofthethreepreceding indices? Usingonlydatafromtheyear2000,computethelinearcoefficients(a1,a2,a3)thatmaximize thelog-likelihoodL=Pt logP(xt|xt1,x t2,x t3),where: P(xt|xt1,x t2,x t3)= 1 p2⇡ exp"1 2✓xt a1xt1 a2xt2 a3xt3◆2#, andthesumisoverbusinessdaysintheyear2000(startingfromthefourthday). (b) Meansquaredpredictionerror Forthecoefficientsestimatedinpart(a),comparethemodel’sperformance(intermsofmeansquared error) on the data from the years 2000 and 2001. Would you recommend this linear model for stock marketprediction? (c) Sourcecode Turn in a print-out of your source code. You may program in the language of your choice, and you maysolvetherequiredsystemoflinearequationseitherbyhandorbyusingbuilt-inroutines(e.g.,in Matlab,NumPy).