Starting from:

$30

CS451 Homework 1

1
CS451 “Introduction to Parallel and Distributed Computing”
Homework 1

• Please upload your assignment on the Blackboard with the following name: CS451_
LastName_FirstName_HW1. Please do NOT email your assignment to the instructor
and TA!
1. What are the two main styles of parallelism? Explain.
2. What are the three parallel programming paradigms? Explain.
3. Discuss the difference between shared address space machines and distributed
address space machines. And discuss the advantages and disadvantages of both
architectures.
4. Give an example of anti dependence and give a corresponding solution to remove
the dependence.
5. Consider the search tree shown in the following figure, in which the dark node
represents the solution.

(a) If a sequential search of the tree is performed using the standard depth-first
search (DFS) algorithm, how much time does it take to find the solution if
traversing each arc of the tree takes one unit of time? Note: DFS begins by
expanding the initial node and generating its successors. In each subsequent
step, DFS expands one of the most recently generated nodes. If this node has
no successors (or cannot lead to any solutions), then DFS backtracks and
expands a different node.
(b) Assume that the tree is partitioned between two processing elements that are
assigned to do the search job, as shown in figure b. If both processing
elements perform a DFS on their respective halves of the tree, how much time
does it take for the solution to be found? What is the speedup? Is there a
speedup anomaly? If so can you explain the anomaly?
2
6. Derive the formula for calculating the average access time for a word in a system
with three levels of cache. Assume the following values for a theoretical system
containing an L1, L2, and L3 cache.
 Location Latency
 --------- -------------
 L1 2 ns
 L2 8 ns
 L3 30 ns
 Main Memory 100 ns
If an application has following hit rates, what is the average memory access time
for a memory word?
Location Hit rate
------------ ----------
L1 40%
L2 70%
L3 90%
Main Memory 100%
7. Consider a parallel matrix vector multiplication algorithm:
 C=A*b
14 1 2 3 1
32 4 5 6 2
50 7 8 9 3
é ù é ù éù
ê ú ê ú êú = ´
ë û ë û ëû
14 1 2 3
32 4 1 5 2 6 3
50 7 8 9
é ù éù éù éù
ê ú êú êú êú = ´ + ´ + ´ ê ú êú êú êú
ë û ëû ëû ëû
Sequential code:
matvec() {
int i,j;
for (j=0;j<n;j++){
for (i=0;i<m;i++) {
c[i]=c[i]+a[i][j]*b[j];
}
}
}
3
Using fixed-size speedup and scalability analysis techniques similar to those
employed in the class notes, answer the following.
(a) Analyze the running time of the serial algorithm as a function of the matrix
dimension n and m, you may assume all operations take unit time. (More
appropriate runtime order techniques will be presented later in the course –
“big O” notation.)
(b) Analyze the running time of the parallel algorithm as a function of n, m and p.
You may also assume n and m are evenly divisible by p.
(c) Obtain expression for the speedup, Sp, and the Amdahl’s fraction a.
(d) Determine if the algorithm is effective. Briefly explain.
Note on cheating: There are penalties for cheating. Don’t find out the hard way.
Working in groups is fine for discussing approaches and techniques. Copying
problem solutions or code is cheating. Both the person copying and the person
giving the answers will be equally penalized. Make sure you do your own work.
Parallel code:
void matvec() {
int i,j,nprocs,myid;
float tmp[max];
for (i=0;i<m;i++){
tmp[i]=0;
}
nprocs= no. of processors used;
myid=process id;
for (j=myid;j<n;j+=nprocs){
for (i=0;i<m;i++)
tmp[i]=tmp[i]+a[i][j]*b[j];
}
lock();
 for (i=0;i<m;i++)
c[i]=c[i]+tmp[i];
unlock();
}

More products