CS314 Assignment 8 Problem 1 - Dependencies Suppose we have the following program (a sequence of instructions), with each instruction labeled as S<num: S1 : a := 4; S2 : b := 2; S3 : c := 5; S4 : read ( d ); S5 : a := b + 3; S6 : b := a - 3; S7 : c := d * b ; S8 : e := a + 6; S9 : print ( c ); S10 : print ( e ); (a) Give the statement-level dependence graph for the above program. A node in the statement-level dependence graph represents a statement, an edge represents dependence between the statements (i.e. the nodes). Label each edge as a true data dependence, an anti data dependence, or an output data dependence. (b) Assume that each statement takes 1 cycle to execute. What is the execution time of the sequential code? What is the fastest parallel execution time of the program (i.e. the critical path)? You may assume that I/O operations (read and print) can be done in parallel. Problem 2 - Dependence Analysis Give the source and sink references, the type (whether a dependence is true, anti, or output), and the distance vectors for all dependences in the following loops. (a) do i = 3, 100 a(i) = a(i - 1) + a(i + 1) - a(i - 2) enddo (b) do i = 2, 100 a(3 * i) = a(3 * i - 3) + a(3 * i + 3) enddo (c) do i = 1, 10 a(i) = a(5) + a(i) enddo Use aW (i) and aR(i) to annotate the write access to a(i) and the read access to a(i) respectively. Problem 3 - Loop Parallelization Given the following nested loop: 1 do i = 2 , 100 do j = 2 , 100 S1 : a (i , j ) = b ( i - 1 , j - 1) + 2 S2 : b (i , j ) = i + j - 1 enddo enddo (a) Give the statement-level dependence graph. Show the dependence graph for statement instances in a part of the iteration space: i = 2 ... 5, j = 2 ... 5. (b) In its current form, can any loop be parallelized? If so, which loop(s)? If not, justify your answer. (c) Provide one valid affine schedule for statements S1 and S2 such that p(S1)= C11*i + C12*j + d1 and p(S2)= C21*i + C22*j + d2 in order to achieve synchronization-free parallelism. There could be many possible solutions for {C11, C12, C21, C22, d1, d2}. You only need to provide one feasible solution. (Hint: You can let d1 = d2 = 0.) (d) Generate two-level loop code for the affine schedule you provided. Please use p as outermost loop and i as innermost loop in the transformed loop. Calculate the loop bounds for p and i using Fourier-Motzkin elimination. You might need to calculate the overlapping polyhedron for S1 and S2 in order to eliminate the j loop. Please refer to the techniques for code generation in lecture 20 and lecture 21. 2