$30
ECSE 420 – Parallel Computing –
Lab 2 – CUDA Convolution and Matrix Inversion
In this lab, you will write code for simple signal processing, parallelize it using CUDA,
and write a report summarizing your experimental results. We will use PNG images and
TA-provided data as the test signals.
A. Convolution:
Convolution is a slightly more complicated operation than rectification and pooling from
Lab 1, but it is still highly parallelizable. For each pixel in the input image, a 3x3
convolution computes the corresponding pixel in the output image using a weighted sum
of the input pixel and its neighbors. That is:
𝑜𝑢𝑡𝑝𝑢𝑡[𝑖][𝑗] = ∑ ∑ 𝑖𝑛𝑝𝑢𝑡[𝑖 + 𝑖𝑖 − 1][𝑗 + 𝑗𝑗 − 1]𝑤[𝑖𝑖][𝑗𝑗]
2
𝑗𝑗=0
2
𝑖𝑖=0
,
1 ≤ 𝑖 ≤ 𝑚 − 1,1 ≤ 𝑗 ≤ 𝑛 − 1
where m is the number of rows in the input image, and n is the number of columns in the
input image. Since we are using square weight matrices and the “valid padding”
definition of convolution (1 ≤ i ≤ m − 1, 1 ≤ i ≤ n − 1), the output image will be of size m
− 2 by n − 2. Figure 3 illustrates the convolution operation for a certain weight matrix,
W.
Write the code that for convolution. For this lab, you need to implement convolution
using 3x3, 5x5 and 7x7 weight matrices. Before you convert the output to unsigned chars
and save the file, you should clamp the output of the convolution between 0 and 255 (i.e.,
if a value in the output is less than 0, you should set it equal to 0, and if a value is greater
than 255, you should set it equal to 255). The header file “wm.h” contains the weight
matrix you should use. Analyze, discuss, and show an example image as described in the
first section.
Figure 1: Convolution Example
The grader should be able to change the input arguments to your code: input file,
dimension of the weight matrix (3, 5 or 7) and no. of threads. You could set it as a
hard code inside your code, or just pass them as input arguments to your project.
When the input test image is “test.png”, the output of your code should be identical to
“test_convolve.png”. These would be available on myCourses.
B. Matrix Inversion and Linear System Solution
In this part, you will make a CUDA program that inputs matrix A, a vector b and finds a
vector x such that Ax = b.
You will benchmark your code against several test cases provided on MyCourses and
report the performance obtained. At minimum, you should test for the testcases where the
matrices are of dimension 32x32, 512x512 and 1024x1024. Furthermore, you should
verify the results of your programs by multiplying the matrix A and the obtained vector x
and then subtracting the vector b to get a vector Zero.
All the demonstrations of the code should be included in the report. The performance and
correctness of the program should be clearly visible in the report.
Submission Instructions:
NO LAB DEMO. There will not be any more demos for the labs. Please submit your
entire solution (along with input and output images) and the report in a zip file.
Each group should submit a single zip file with the filename Group <your group
number _Lab2.zip. (Ex – Group04_Lab2.zip).
Format for Report:
1. Must be a PDF only (No word document).
2. Must be named Group<your group number_Lab2_Report. (Ex –
Group03_Lab2_Report.pdf).
3. Must have a cover page.
4. Must follow the logical order for the lab discussions.
5. Have an appendix with your own code inside. Copy paste and merge the format
so that the code alignment remains the same as in your IDE (MS Visual Studio).