$30
1. (30 pts) Assume the $D cache is initially filled with word address 0, 1, 2, …15
memory data (referenced by the datapath in that order). Below is the next sequence
of data memory address references given as word addresses.
1, 20, 2, 3, 4, 18, 5, 19, 33, 34, 1, 4
a) Assuming a direct-mapped $D cache with one-word blocks and a total size of
16 blocks, list if each reference is a hit or a miss. Show the state of the $D cache
after the last reference. What is the hit rate for this reference string?
b) Now, assuming a direct-mapped $D cache with two-word blocks and a total size
of 8 blocks, list if each reference is a hit or a miss. Show the state of the $D cache
after the last reference. What is the hit rate for this reference string?
c) Now, assuming a set-associative $D cache with two ways, two-word blocks and
a total size of 8 blocks, list if each reference is a hit or a miss (assume LRU
replacement). Show the state of the $D cache after the last reference. What is the
hit rate for this reference string?
2. (40 pts) For the following questions, the $I cache has a hit rate of 90%, a hit
latency of 1 cycle, and a miss penalty of 100 cycles (the miss penalty includes
both time to get the information from the main memory and the $I cache miss
penalty together). The $D cache has a hit rate of 85%, a 2 cycle hit latency, a miss
penalty of 120 cycles (the miss penalty includes both time to get the information
from the main memory and the $D cache miss penalty together), and memory
accesses (lw and sw) that are 30% of the instruction mix.
a) What is the AMAT for data? Note: The cache must be accessed after memory
returns the data.
b) Calculate the CPI taking into account stalls due to $D cache and $I cache misses
broken down into additional CPI due to $I cache stalls, additional CPI due to $D
cache stalls, and overall CPI. The base CPI using a “perfect” memory system is
1.0. Assume everything else is working perfectly - loads never stall a dependent
instruction. Also assume the processor waits for stores to finish when they miss in
the $D cache, and that $I cache misses and $D cache misses never occur at the
same time.
c) What is the AMAT for data if you were to add a 1MB L2 cache with a 95% miss
rate and a 10 cycle hit latency?
3. (30 pts) Consider the code sequence below executed in a 5-stage pipelining outof-order machine. Assume we have full data forwarding (only load-use hazard
will introduce one nop). Also assume load instruction takes three cycles to get the
data from memory. Show the detailed entries of IQ (reservation station), LSQ and
ROB for each clock cycle. You are allowed to make reasonable assumptions if you
think necessary. Please specify your assumptions clearly.
lw $t0, 100($s0)
add $t1, $t0, $t2
sub $t5, $t5, $t2
sub $t3, $t1, 200