Starting from:

$30

Computer Architecture Homework #1

CDA 4205 Computer Architecture Homework #1


Problem 1 (textbook problem 1.5) (15 points)
Consider three different processors P1, P2 and P3 executing the same instruction set. P1 has a 3GHz clock rate and a CPI of 1.5. P2 has a 2.5 GHz clock rate and a CPI of 1.0. P3 has a 4.0 GHz clock rate and has a CPI of 2.2.
a.    Which processor has the highest performance expressed in instructions per second?
b.    If the processors each execute a program in 10 seconds, find the number of cycles and the number of instructions.
c.    We are trying to execute the same program as in part b), and we are trying to reduce the execution time by 30% but this leads to an increase of 20% in the CPI. What clock rate should we have to get this time reduction?

Problem 2 (textbook problem 1.6) (10 points)
Consider two different implementations of the same instruction set architecture. The instructions can be divided into four classes according to their CPI (class A, B, C and D). P1 with a clock rate of 2.5 GHz and CPIs of 1, 2, 3 and 3, and P2 with a clock rate of 3 GHz and CPIs of 2, 2, 2 and 2.
Given a program with a dynamic instruction count of 1.0E6 instructions divided into classes as follows: 10% class A, 20% class B, 50% class C and 20% class D, which implementation is faster?
a.    What is the global CPI for each implementation?
b.    Find the clock cycles required in both cases.

Problem 3 (textbook problem 1.9) (15 points)
Assume for arithmetic, load/store, and branch instructions, a processor has CPIs of 1, 12, and 5, respectively. also assume that on a single processor a program requires the execution of 2.56E9 arithmetic instructions, 1.28E9 load/store instructions, and 256 million branch instructions. Assume that each processor has a 2 GHz clock frequency.
Assume that, as the program is parallelized to run over multiple cores, the number of arithmetic and load/store instructions per processor is divided by 0.7 x p (where p is the number of processors) but the number of branch instructions per processor remains the same.
a.    Find the total execution time for this program on 1, 2, 4 and 8 processors, and show the relative speedup of the 2, 4, and 8 processor result relative to the single processor result.
b.    If the CPI of the arithmetic instructions was doubled, what would the impact be on the execution time of the program on 1, 2, 4 or 8 processors?
c.    To what should the CPI of load/store instruction be reduced in order for a single processor to match the performance of four processors using the original CPI values?
Problem 4 (textbook problem 1.13) (15 points)
Consider a computer running a program that requires 250 s, with 70 s spent executing float-point (FP) instructions, 85 s executed load/store (L/S) instructions, and 40 s spent executing branch instructions.
a.    By how much is the total time reduced if the time for FP operations is reduced by 20%?
b.    By how much is the time for INT operations reduced if the total time is reduced by 20%?
c.    Can the total time be reduced by 20% by reducing only the time for branch instructions?
Problem 5 (20 points)
Assume that we are considering enhancing a machine by adding vector hardware to it. When a computation is run in vector mode on the vector hardware, it is 10 times faster than the normal mode of execution. We call the percentage of time that could be spent using vector mode the percentage of vectorization.
a.    Draw a graph that plots the speedup as a percentage of the computation performed in vector mode. Label the y-axis “Net Speedup” and label x-axis “Percentage Vectorization.”
b.    What percentage of vectorization is need to achieve a speedup of 2?
c.    What percentage of computation run time is spent in vector mode if a speedup of 2 is achieved?
d.    What percentage of vectorization is needed to achieve one-half the maximum speedup attainable from using vector model?
Problem 6 (10 points)
Assume that we make an enhancement to a computer that improves some mode of execution by a factor of 10. Enhanced mode is used 50% of the time, measure as a percentage of the execution time when the enhanced mode is in use. Recall that Amdahl’s law depends on the fraction of the original, unenhanced execution time that could make use of enhanced mode. Thus, we cannot directly use this 50% measurement to compute speedup with Amdahl’s law.
a.    What is the speed up we have obtained from fast mode?
b.    What percentage of the original execution time has been converted to fast mode?
    Submission Requirements
The following requirements are for electronic submission via Canvas.
    Your solutions must be in a single file with a file name yourname-hw1.
    Upload the file by following the link where you download the homework description on Canvas.
    If scanned from hand-written copies, then the writing must be legible, or loss of credits may occur.
    Only submissions via the link on Canvas where this description is downloaded are graded. Submissions to any other locations on Canvas will be ignored.

More products