$30
Computer Vision
Homework 3
Stereo and Segmentation
For the third homework assignment, we are asking you to implement a few
functions in a stereo pipeline example provided by mathworks.com
Provided files:
In your Matlab Command Window, type
edit DepthEstimationFromStereoVideoExample
What You Have to Do:
Task 1 (15 points): Calculate disparity using the SSD algorithm
Implement the SSD matching algorithm for a range of window sizes, and create a
disparity image D(x, y) such that Left(x, y) = Right(x + D(x, y), y) when matching
from left to right.
Write a function to replace the disparity built-in function on line 54. Your function
needs to accept as an input value the size of the window search. Call your function
three times for the following window sizes: 1, 5 and 11.
Create a 2x2 subplot and display the disparity map results from these three function
calls, plus the one obtained using Matlab’s built-in disparity function.
Notes:
1. Remember to reduce the search only over epipolar lines.
2. You should also restrict the search using the maximum disparity range. You
can use the value 63. This is the maximum disparity value as calculated by
using the built-in Matlab disparity.m function (see line 56 in the original
script). Feel free to try a lower value and notice if your disparity maps
improve.
3. When using a window size bigger than 1, it is common practice to use a
Gaussian weighting of the window, so that the center has greater influence
than the periphery. Convolve your window matrices with a Gaussian filter
prior to computing the sum of squared differences.
4. For this task, you do not need to impose any constraint on the matching
algorithm. But, if your matching algorithm finds more than one “best match”,
you will want to come up with a rule for selecting just one match and stick to
it. Don’t forget to let us know in your comments, what rule did you use.
Task 2 (15 points): Calculate disparity using the NCC algorithm
Implement the normalized cross correlation (NCC) matching algorithm for a range
of window sizes, and create a disparity image D(x, y) such that Left(x, y) = Right(x +
D(x, y), y) when matching from left to right. Similarly, remember to reduce the
search only over epipolar lines and to restrict the search using the disparity range
[0,63] (see line 56 in the original script).
Information on how to compute NCC for image-processing applications here:
https://en.wikipedia.org/wiki/Cross-correlation#Normalized_cross-correlation
The inner product of vectors approach mentioned on Wikipedia, also here, slide 66:
http://www.gris.tudarmstadt.de/teaching/courses/ws0910/cv2_ws0910/slides/l3-stereo-v1_0.pdf
Write a function to replace the disparity built-in function on line 54. Your function
needs to accept as an input value the size of the window search. Call your function
three times for the following window sizes: 3, 5 and 7.
Create a 2 x 2 subplot and display the disparity map results from these three
function calls, plus the one obtained using Matlab’s built-in disparity function.
Note (same as for Task1):
1. For this task, you do not need to impose any constraint on the matching
algorithm. But, if your matching algorithm finds more than one “best match”,
you will want to come up with a rule for selecting just one match and stick to
it. Don’t forget to let us know in your comments, what rule did you use.
Task 3 (20 points): Uniqueness constraint. So far, you have found matches based
on an epipolar constraint (matches lie on the same scanline) and on the similarity
constraint ( because each unique pair of visual features originate from the same
physical surface or object, they naturally look very similar and should have similar
intensity values). Modify your implementation of the disparity function to
incorporate the uniqueness constraint: one visual feature in one retinal image can
have at most one corresponding counterpart in the other image. This is explained by
the fact that one physical point/surface always ‘causes’ at most one feature in each
retinal image, and one physical point/surface cannot project at two retinal locations
at the same time. Your algorithm should enforce a rule that, for each left-/right- eye
feature, there will be only one corresponding feature in the right-/left- eye image.
Note:
1. There is no correct answer here. Think hard and come up with your own
solution to this constraint and don’t be upset if another peer comes up with a
different way.
Task 4 (20 points): Generate outliers map - Refine the disparity by performing a
left-right consistency check.
The disparity map dLR(x) is acquired considering as reference image the left image
of the stereo pair. If the right image is considered as reference, then the disparity
map dRL(x) is acquired. The disparity maps dLR(x) and dRL(x) can be useful in
detecting problematic areas, especially outliers in occluded regions and depth
discontinuities. One strategy for detecting outliers is the Left–Right consistency
check introduced by (1). In this strategy, the outliers are disparity values that are
not consistent between the two maps and therefore, they do not satisfy the relation:
| dLR(x) −dRL(x−dLR(x)) | ≤ TLR, where TLR is a user-defined threshold.
Write a function that accepts as inputs two disparity maps (LR and RL) and a value
for TLR and returns a binary image where the outliers have the value 1 and the
inliers have the value 0.
Call this function twice, passing as inputs the LR and RL disparity maps from SSD
and NCC matching with window size of 3. Use a TLR value of 1. With this value, pixels
with difference equal to 1 in the Left–Right consistency check are not considered
outliers. Plot the two resulting binary images side by side.
Task 5 (10 points): Compute depth from disparity
Now that we have a disparity image, computing depth at every pixel in the left image
should be very straightforward. Use equation 11.1 from the Szeliski textbook.
Write a function to replace the reconstructScene built-in function on line 64. Your
function should have the same input arguments as the built-in version and it should
return a matrix of depth values for each pixel location from the left image.
Task 6 (20 points): Synthetic stereo sequences
As you’re developing stereo matching algorithms, it might prove useful to test them
on synthetic stereo sequences, and to compare with a ground truth disparity map.
Use the provided pair of synthetic images and your disparity implementation
functions from Tasks 1-3 to generate your disparity map(s). Compare your result to
the corresponding ground truth (provided):
1. Compare disparity maps visually by plotting side by side
2. Calculate a map of errors and display it
3. Calculate a histogram of disparity differences and display it – the bin size is
up to you.
Dynamic Programming
Task 7 (40 points): Calculate stereo disparity using the DP (as outlined below)
Here, you will implement a stereo algorithm that uses dynamic programming. This
algorithm enforces the ordering constraint, the uniqueness constraint, and matches
individual pixels based on a cost function. Every pixel in one image can either match
one pixel in another image, or be marked as occluded.
Note: this algorithm assumes the image intensities in the left and right image fall in
the range 0 to 1 (i.e. the image is grayscale).
Part A (20 points): Disparity matching along each epipolar lines
Your DP algorithm will replace the disparity.m built-in Matlab function, just like the
ones for Task 1 and 2. For each image pair, you seek to output a disparity map for
the left image indicating the disparity to the right image. The cost function is
defined such that the penalty for matching two pixels is the square of the difference
in their intensity. The penalty for a pixel being occluded is a fixed value, occ.
The images (frames in the video) are already rectified, so that the epipolar lines are
horizontal scan lines of the input images. Thus, you just need to run the DP stereo
algorithm on each corresponding line/row in the two images. You will need to call
your function once for each epipolar line. Your function should have the form:
D = stereoDP(e1, e2, maxDisp, occ)
The recommended value for occ is 0.01. For maxDisp, you can start with the
value 63 (this is the maximum disparity value as resulting from using the built-in
Matlab disparity.m function). Feel free to try a lower value and notice if your
disparity maps improves.
Algorithm:
Consider two scanlines Il(i) and Ir(j), 1 ≤ i, j ≤ N, where N is the number of pixels in
each line (the process will be repeated for each row of the image). Pixels in each
scanline may be matched or skipped if they are considered to be occluded, in either
the left or right image).
Let dij be the cost associated with matching pixel Il(i) with pixel Ir(j). Here we
consider a squared error measure between pixels given by:
dij = (Il(i) − Ir(j))2
The cost of skipping a pixel (in either scanline) is given by a constant occ.
We can compute the optimal (minimal cost) alignment of two scanlines recursively
as follows:
1. D(0,j) = j * occ
2. D(i,0) = i * occ
3. D(1, 1) = d11,
4. D(i, j) =
= min(D(i−1,j−1) + dij , D(i−1,j) + occ, D(i,j−1) + occ)
Note: just like in the LCS complete problem, you will need to save which “direction”
was used for the calculation of each D(i, j)value: North, West, or Northwest.
Part B (10 points): Backtracking
Given D we find the optimal alignment (and thus the disparity) by backtracking.
Starting at (i, j) = (N, N), trace your path of minimum cost all the way to (1,1). You
will need to use the “directions” saved in part A. Selecting (i − 1, j) corresponds to
skipping a pixel in Il (a unit increase in disparity), while selecting (i, j − 1)
corresponds to skipping a pixel in Ir (a unit decrease in disparity). Selecting (i − 1, j −
1) matches pixels (i, j), and therefore leaves disparity unchanged.
Part C (10 points): Displaying the disparity map, again
For display purposes, the disparity values should be normalized and mapped to the
range [0,1] and, to distinguish valid disparities from occlusions, you should colorize
the image so that occluded pixels are shown in color while the rest of the disparity
map is shown in grayscale. Here is pseudo-code for scaling the values appropriately
and showing occlusions in a different color:
function [d_color] = display_dmap(d)
% 1. Map disparity into the range [0,1]
% max_d = the maximum calculated value of disparity;
% min_d = the minimum calculated value of disparity;
% scale the disparity values by subtracting the minimum
% value min_d and dividing by the difference beween max_d
% and min_d
% 2. Colorize occluded pixels to be red
% dColor = color image where each RGB layer is equal to the
% scaled disparity matrix (values between 0 and 1)
% find the positions/indices where each of the 3 values of
% dColor is equal to NaN, and store them in a variable
% replace the values of these positions with:
% dColor(at position in R layer) = 1;
% dColor(at position in G layer) = 0;
% dColor(at position in B layer) = 0;
% 3. Display dColor image using imshow
Since evaluating these cost functions can be computationally intensive, you may find
it helpful to optimize your implementation to get acceptable run-times. The
following list of suggestions may be of help:
1. Adjust the values for the minimum and maximum disparity for each epipolar
line and only evaluate the match cost for pixels in this disparity range. You
can use values obtained at the first pass
2. Think of ways to compute any parts of the match cost using vectorized code
rather than loops
3. Take advantage of Matlab’s profiling tool (available under Desktop → Profiler
in the menu). This will isolate the slowest parts of your code.
Implementing the first suggestion is probably sufficient to make your algorithm run
in a reasonable amount of time.
Submitting the assignment:
Make sure each script or function file is well commented and it includes a block
comment with your name, course number, assignment number and instructor name.
Save all the figures/plots your program generates as image files. Zip all the .m and
image files together and submit the resulting .zip file through Moodle as homework
3 by Sunday, March 11th, by 11:55pm.
(1) M. Humenberger, T. Engelke, W. Kubinger, A census-based stereo vision algorithm
using modified semi-global matching and plane-fitting to improve matching quality,
CVPR ECV Workshop, 2010.