$30
CS 131 Computer Vision: Foundations and Applications
Problem Set 1
1 Linear Filters
In class, we introduced 2D discrete space convolution. Consider an input image I[i, j] and an m×n
filter F[i, j]. The 2D convolution I ∗ F is defined as
(I ∗ F)[i, j] = X
k,l
I[i − k, j − l]F[k, l]. (1)
Hint: The above operation is run for each pixel (i, j) of the result.
(a) Convolve the following I and F. Assume we use zero-padding where necessary.
I =
?
2 0 1
1 −1 2?
F =
?
1 −1
1 −1
?
(2)
Important: DO NOT use Matlab for this question. Please show all necessary steps to receive full
credit.
(b) Note that the F given in (2) is separable; that is, it can be written as a product of two 1D
filters: F = F1F2. Here, we have
F1 =
?
1
1
?
F2 =
1 −1
(3)
Compute (I ∗ F1) and (I ∗ F1) ∗ F2, i.e. first perform 1D convolution on each column, followed by
another 1D convolution on each row.
Important: DO NOT use Matlab for this question. Please show all necessary steps to receive full
credit.
(c) Prove that for any separable filter F = F1F2,
I ∗ F = (I ∗ F1) ∗ F2 (4)
Hint: Expand equation (1) directly.
(d) In this question, we will count the exact number of multiplications (including those multiplications due to zero-padding) involved in part (a) and (b).
1
(i) When counting part (a), whats the number of multiplications if you flip and shift I?
(ii) Is the number of multiplications the same if you flip and shift F?
(iii) What are the number of multiplications for part (b)?
(iv) Which one of the two methods (a) or (b) requires fewer calculations?
Hint:
• The answers should be exact numerical values. We will not accept any approximation.
• You may find the computation steps you wrote down in parts (a) and (b) helpful here.
(e) Consider a more general case: I is an M1 by N1 image, and F is an M2 by N2 separable filter.
(i) How many multiplications do you need to do a direct 2D convolution?
(ii) How many multiplications you need to do 1D convolutions on rows and columns?
Hint: For (i) and (ii), the results should be two functions of M1, N1, M2 and N2. We will
not accept any approximation.
(iii) Use Big-O notation to argue which one is more efficient in general: direct 2D convolution
or two successive 1D convolutions?
2 Normalized Cross-correlation
Background: In class, we covered cross-correlation, in which a template image is multiplied with
sections of a larger image to measure how similar each section is to the template. Normalized
cross-correlation is a small refinement to this process. In rough terms, it works like this: before
multiplying the template with each small section of the image, the image section is scaled and
offset so it has zero mean and variance of 1. This increases accuracy by penalizing image sections
which have high intensity but do not match the pattern of the template. The MATLAB function
normxcorr2 performs this entire process for you.
(a) Use the provided crossCorrelation.m file to load the provided photo and template. Read
the MATLAB documentation for normxcorr2, and use it to perform cross-correlation to find the
section of the image that best matches the template. Include your MATLAB code in your writeup,
and describe why the peak occurs where it does. Also explain the straight-line artifacts in the
cross-correlation. You don’t need to include the cross-correlation image itself in your report, as it
may not print well.
(b) In crossCorrelation.m, perform cross-correlation using the larger template. Note that the
larger template does not exactly match the image. Describe your results, and why they are different
from part (a). What does this tell you about the limitations of cross-correlation for identifying
objects in real-world photos? Also justify what applications cross-correlation would be good at.
2
Hint: The provided code auto-scales image brightness so that it covers the full range. Therefore,
same colors do not necessarily have the same value in the two cross-correlation images.
(c) Above, we saw that cross-correlation can be fragile. One way to make it less fragile is to perform cross-correlation using many templates to cover the different ways an object may appear in
an image. Suppose we wish to search for NR possible rotations of an object at NS possible sizes.
Assume the image is size n×n and the template is roughly size m×m. How many math operations
will the entire search require?
Hint: We’re just looking for a “Big-O Notation” estimate. In other words, you may neglect constant
factors, such as the effects of image edge padding, and smaller terms.
3 Canny Edge Detector
Figure 1: A rotated detected edge
(a) Suppose that the Canny edge detector successfully detects an edge in an image. The edge (see
Figure 1) is then rotated by θ, where the relationship between a point on the original edge (x, y)
and a point on the rotated edge (x
0
, y0
) is defined as
x
0 = x cos θ y0 = x sin θ (5)
Will the rotated edge be detected using the same Canny edge detector? Provide either a mathematical proof or a counter example.
Hint: The detection of an edge by the Canny edge detector depends only on the magnitude of its
derivative. The derivative at point (x, y) is determined by its components along the x and y directions. Think about how these magnitudes have changed because of the rotation.
3
(b) After running the Canny edge detector on an image, you notice that long edges are broken into
short segments separated by gaps. In addition, some spurious edges appear. For each of the two
thresholds (low and high) used in hysteresis thresholding, explain how you would adjust the threshold (up or down) to address both problems. Assume that a setting exists for the two thresholds
that produces the desired result. Briefly explain your answer.
4 Difference-of-Gaussian (DoG) Detector
(a) The 1-D Gaussian is
gσ(x) = 1
√
2πσ
exp ?
−
x
2
2σ
2
?
(6)
Calculate its 2nd derivative with respect to x, and use Matlab to plot it (use σ = 1).
(b) Use Matlab to plot the difference of Gaussians in 1-D given by
D(x, σ, k) = gkσ(x) − gσ(x)
kσ − σ
(7)
using k = 1.2, 1.4, 1.6, 1.8 and 2.0. State which value of k gives the best approximation to the 2nd
derivative with respect to x. You may assume that σ = 1.
Important: Submit your Matlab code in the pdf as a “code” block.
(c) The 2D equivalents of the plots above are rotationally symmetric. To what type of image structure will a difference of Gaussian respond maximally?
4
5 RANSAC for Fitting Circles
Figure 2: Fitting a circle to a group of 2D points.
In class, we discussed how to fit a line to a series of points using RANSAC. In this problem, you
are going to develop an algorithm that uses RANSAC for fitting a circle to a group of points in two
dimensional space {(xi
, yi)}
n
i=1 (see Figure 2).
Important: Please submit your code and plots for this problem.
(a) A circle C in 2D space is given by its center (cx, cy) and its radius R. Before we implement
RANSAC, we need a way to fit a circle (cx, cy, R) to a chosen set of points (we will need at least
3 points to specify a circle, and with more points we can get an “average circle” that provides a
better fit). Read the provided handout on least-squares model fitting. We will use least-squares to
fit a circle to 3 or more points. Remember that, for each point i, a circle should satisfy the simple
scalar equation:
R
2 = (xi − cx)
2 + (yi − cy)
2
.
From basic algebra, this can be rewritten as
R
2 = x
2
i − 2xicx + c
2
x + y
2
i − 2yicy + c
2
y
2xicx + 2yicy + R
2 − c
2
x − c
2
y = x
2
i + y
2
i
Note that our equations contain squares of unknown variables (R, cx and cy). Least squares can only
handle linear equations, so this is a problem. However, we can define a new variable q = R2−c
2
x−c
2
y
.
Using this trick, we have:
2xicx + 2yicy + q = x
2
i + y
2
i
5
This is indeed a linear equation (note x
2
i
and y
2
i
are constants), so a set of these equations can be
solved by least squares! Afterward, we can recover the value of R2
from q.
Your task is to edit the provided FitCircle.m to set up a system of the above equations (one
equation per input point) and solve the system using least squares. Run the provided TestFit.m
to check your solution. The fit should be good on the first plot, but may be sensitive to outliers
for the second plot. Include your code and the plots from a run of TestFit.m in your submitted
solution.
(b) Using your FitCircle() function as a subroutine, complete the RANSAC.m function. Run the
provided TestRansac.m to test your function. Include your code and the plots from a run of
TestRansac.m in your submitted solution. Hint: review the lecture slides on RANSAC and read
through RANSAC.m from the beginning. Be sure to understand the meaning of each input argument
before coding. Use the given functions to help you. Each “Your Code Here” spot should only take
one or a few lines of code. Look at our default value for each variable to understand the format
expected.
(c) Above, we applied RANSAC to N = 10 points. Edit TestRansac.m to set N = 1000 and
run it a few times to see the results. Briefly explain how the results are different and why. What
RANSAC parameters could you change to improve the solution? You can modify the parameters
in TestRansac.m to test your answer.
6 Pinhole Camera Model
In class, we have discussed the pinhole camera model. In this problem, we will calibrate a pinhole
camera. We have provided an illustration for pinhole camera model in Figure 3, where O is the
location of the pinhole. The focal length, i.e. the distance from the image plane to the pinhole is
f.
(a) Suppose we have a point (x, y, z) in the coordinate system defined by the pinhole and the image
plane origin is aligned to the pinhole. What’s the corresponding point in the image plane?
Hint: You can use f, x, y and z to represent the corresponding point.
(b) Now we have a calibration board (or a checkerboard) as in Figure 4. Each black square on the
checkerboard has an area of S. Assume we have image plane and calibration board face each other
in parallel and the distance between the calibration board and the pinhole is L. What’s the area
of each black square in the image plane?
(c) Now the image plane origin is no longer aligned to the pinhole, which is (cx, cy). Suppose we
have a point (x, y, z) in the coordinate system defined by the pinhole, what’s the corresponding
point in the image plane?
Hint: You can use cx, cy, f, x, y and z to represent the corresponding point.
(d) Continue with part (c). We have our pinhole camera parameters f, cx and cy. Suppose these
parameters are unknown, and we want to measure them by taking photos of our calibration board.
The calibration process works as follows:
1. Place the checkerboard parallel with the image plane at some distance L.
6
Figure 3: Pinhole camera model
2. Mark some corners (or some distinctive points on the checkerboard), measure each point’s
location (x, y, L) on the checkerboard and then take a photo.
3. Find the corresponding points in the image and measure their location in the image.
What’s the minimal number of points to measure in order to get the pinhole camera parameters f,
cx and cy? Explain your reasons.
Hint: We assume that the calibration process involves placing the checkerboard at some distance L1
and then measuring (x1,y1,L1). Then we place the checkerboard at depth L2 and measure (x2,y2,L2)
and so on.
(e) Continue with part (d). Is it possible to get the pinhole camera parameters f, cx and cy with our
checkerboard fixed (i.e. without moving the checkerboard during the calibration process)? Justify
your answers.
Hint: we are asking about the effects of calculating the pinhole camera parameters if we keep the
checkerboard fixed at some distance L and measure (x1,y1,L) and (x2,y2,L) and so on.
7
Figure 4: A sample calibration board
8