Starting from:

$29.99

Homework 4 Latencies

P1. Consider the code and latencies in Figure 3.48. Assume for this problem an inorder, scalar pipeline. Note that Figure 3.48 gives latencies in a different format than we usually use. When it indicates a latency of “+4” (meaning 4 stalls), this is what we would say is a latency of 5. For example, an instruction with latency “+4” (really 5) that begins execution in cycle 2 will write the CDB in cycle 6 – thus a dependent instruction will begin execution in cycle 7 (5 cycles after the first). I realize it seems odd to have a CDB in an in‐order machine, but that allows better consistency with the following problems. Also, in all of problems P1‐P4 assume branches are always predicted correctly, targets resolved in IF, and there is no branch delay slot. a. Show the execution of the code using same format as Figure 3.19 in the book. How many cycles does this code take? b. What is the steady‐state CPI of the loop (assuming many iterations)? P2. Reorder the code from P1 to get better performance. How many cycles per iteration do you expect to save, in steady state? P3. Repeat P1, but for a dynamically scheduled (assume Tomasulo) scalar processor. P4. If we instead assumed a Scoreboard (no support for register renaming), would the performance be different? P5. Consider Tomasulo (plus reorder buffer for speculation) vs the Instruction Queue (IQ, eg, MIPS R10000) dynamic scheduler. The differences between them are pretty subtle. Describe some code where the IQ approach would outperform Tomasulo, even if by a small amount. Tell me the assumptions you’re making.

More products