Starting from:

$29.99

Homework 3 Scalar Machine

P1. Assume a scalar machine with a 5‐stage pipeline which uses branch prediction, assuming there are no branch delay slots in the ISA. 15% of the instructions for a given test program are branches, of which 80% are correctly predicted. The other 20% of the branches suffer a 4‐cycle misprediction penalty. (In other words, when the branch predictor predicts incorrectly, there are four instructions in the pipeline that must be discarded.) 1. Assuming there are no other stalls, develop a formula for the number of cycles it will take to complete N instructions of this program. 2. Now compare this with an equivalent processor (similar pipeline, branches resolved in the same stage, same instruction breakdowns) where the ISA supports a single branch delay slot (like MIPS). What percentage of the branch delay slots must be filled in order for the CPU with the branch‐delay system to achieve better performance than the one with only the branch predictor? P2. Consider 2 branch predictors, a 2‐level local history predictor (local history table indexed by PC, branch history table of predictors indexed by local history table entry), and a gshare predictor (branch history table indexed by the pc xored with the global history register). In each case the BHT is the same size (indexed by the same number of bits). We also discover that for the following code, the 2‐level local predictor predicts the loop back branch B perfectly (after it gets warmed up), while the gshare predictor does not. What do we know about the size of the BHT? Assume the branches do not conflict with each other in the prediction tables. for (j=0;j<10000;j++) { for (i=0;i<7;i++) { // seven iterations ... if (A[i] = = 0) { // branch A ... } } // loop back branch B } // loop back branch C  
P3. The following loop contains 2 branches. done = 0; i = 0; do { i++; if (i == 10) //location of branch B1 done = 1; // some other code that does not touch i } while (!done) //location of branch B2 Consider 2 different possible branch predictor structures. The first is a 2‐level local predictor, with an n‐bit wide pattern history table used to index a 2n‐entry table of 2‐bit predictors (saturating counters). The second is a correlating predictor, simply using an n‐bit global history register (GHR) to index a 2n‐entry table of 2‐bit predictors (also saturating counters). Assume last n branches before entering loop have the same behavior each time we come to this loop. Complete the following table, assuming the loop is entered many times, without any aliasing from other branches in the predictors: Predictor n B1 predict accuracy B2 predict accuracy 2‐level local 5 2‐level local 15 Correlating 5 Correlating 100% 100% Don’t forget this one! Note that the only reasonable choices for the prediction accuracy are numbers like 80%, 90%, or 100% (i.e., multiples of 10). We’re looking for steady‐state behavior, not startup behavior.     P4. Using the web, list at least one processor that used:  1. Local predictor  2. Global predictor  3. Agree predictor  4. Bimodal predictor  5. Skew predictor  Make sure to mention your source.

More products