$30
CSC202 Homework #5
Problem 1. (25 pts) Let’s continue the stereo problem from before, where your workshop had a stereo rig setup with
the following left and right cameras:
g
W
L =
−0.707 0.000 0.707
−0.354 0.866 −0.354
−0.612 −0.500 −0.612
2.000
10.000
15.000
0 1
, gW
R =
−0.707 0.000 0.707
0.500 0.707 0.500
−0.500 0.707 −0.500
0.000
−30.000
15.000
0 1
,
Furthermore, the left and right cameras both have the same camera matrix,
ΨR = ΨL =
400.000 0.000 320.000
0.000 400.000 240.000
0.000 0.000 1.000
.
(a) Suppose that you imaged two points q1 and q2 in the left and right cameras,
r
L
1 =
?
342.000
236.000 ?
, rL
2 =
?
254.000
351.000 ?
, and r
R
1 =
?
400.000
300.000 ?
, rR
2 =
?
220.000
60.000 ?
.
Where are these two points qW
1
and qW
2
(in world coordinates)?
(b) What depth (or distance along the optical axis) do they have relative to each camera?
(c) Having to always know where the camera is in world coordinates is kind of lame, since the camera can’t really
move. If the camera is expected to move, then what is typically done is that the camera frame from one of the
camera’s is taken to be the world frame. In that case, its transformation of coordinates from world frame to the
chosen camera frame do nothing (basically, the transformation matrix is the identity matrix). In contrast, then
second camera’s transformation describes the changes of coordinates from the chosen camera frame to its own
camera frame (if that makes sense). Doing this trick, then allows one to compute the coordinates of an imaged
point relative to the chosen camera frame. Thus as the stereo rig moves around imaging points, one can always
recover the coordinates of the points in the scene relative to the chosen camera frame at the moment in time that
the images were taken.
Suppose that you move the stereo rig, you want the left camera to be the main camera frame, and you imaged
the following two points in the left and right cameras,
~rL
1 =
?
267.000
290.000 ?
, ~rL
2 =
?
227.000
264.000 ?
and ~rR
1 =
320.000
240.000
1.000
, ~rR
2 =
120.000
40.000
1.000
. (1)
Where are these two points q
L
1
and q
L
2
(relative to the left camera frame)?
Note: In this case, the only transformation needed is g
R
L
, as per the pragraph above.
You are free to use any of the methods discussed in class. The easiest thing would be to write a Matlab function that
takes in the two camera transformations, the matched points in each image, and then returns the necessary info. A
(mostly empty) code stub called stereodepth.m is given that can be used as a template.
Note: To avoid having to enter the matrices and vectors into Matlab, there is a Matlab file stereo02.mat in
the Assignment section.
Problem 2. (20 pts) You’ve got a surveillance camera setup and you’d like to calibrate it in order to reason about
what the camera is seeing. So you go out into the field of view of the camera and place a bunch of markers to image.
Taking a picture of the following markers,
q
W
1 =
1.006
10.080
4.474
, q
W
2 =
−1.580
9.980
7.068
, q
W
3 =
15.150
−0.247
11.200
, q
W
4 =
6.960
5.381
1.199
, q
W
5 =
7.768
6.948
3.091
, q
W
6 =
0.363
6.348
10.520
.
1
Figure 1: Two views of the stereo rig setup and the imaged points in the real world. The pyramid things at the left and
right coordinate frames visualize the image plan using a viewer-centered coordinate system.
leads to the following image coordinates,
r1 =
?
467.000
338.000 ?
, r2 =
?
609.000
353.000 ?
, r3 =
?
72.000
18.000 ?
, r4 =
?
345.000
222.000 ?
, r5 =
?
307.000
243.000 ?
, r6 =
?
479.000
78.000 ?
.
Using the Direct Linear Transform derive the projection matrix M associated to the camera.
Note: To avoid having to enter the world points and the image coordinates, there is a Matlab file calib01.mat in the
Assignment section. The image points are given in ray form. Also, to make your life easier, you may want to consider making the
code a function called calibrateM.m (a code stub is uploaded too).
Problem 3. (20 pts) Your friend started a camera calibration process but did not finish it. He went so far as to compute the
huge matrix to perform the SVD on, but sort of stopped there. The matrix generated is:
7.136 −3.966 3.312 1.000 0.000 0.000 0.000 0.000 3.568 −1.983 1.656 0.500
0.000 0.000 0.000 0.000 7.136 −3.966 3.312 1.000 0.268 −0.149 0.124 0.037
6.523 −4.682 3.858 1.000 0.000 0.000 0.000 0.000 4.843 −3.476 2.865 0.743
0.000 0.000 0.000 0.000 6.523 −4.682 3.858 1.000 −1.027 0.737 −0.608 −0.158
8.835 −3.285 5.794 1.000 0.000 0.000 0.000 0.000 1.171 −0.435 0.768 0.133
0.000 0.000 0.000 0.000 8.835 −3.285 5.794 1.000 −3.114 1.158 −2.042 −0.353
6.364 −1.195 5.239 1.000 0.000 0.000 0.000 0.000 −2.068 0.388 −1.703 −0.325
0.000 0.000 0.000 0.000 6.364 −1.195 5.239 1.000 −1.861 0.350 −1.532 −0.293
8.163 −2.882 2.784 1.000 0.000 0.000 0.000 0.000 1.837 −0.648 0.626 0.225
0.000 0.000 0.000 0.000 8.163 −2.882 2.784 1.000 1.490 −0.526 0.508 0.182
8.413 0.815 7.301 1.000 0.000 0.000 0.000 0.000 −5.532 −0.536 −4.800 −0.658
0.000 0.000 0.000 0.000 8.413 0.815 7.301 1.000 −3.807 −0.369 −3.304 −0.453
but your friend did one funny thing. Since your friend had already had calibrated the intrinsic parameters, rather than use image
coordinates for the generation of the matrix, they were mapped to real world units via the inverse of the camera matrix Ψ. Thus,
the only real unknown is the rotation and translation component (basically the extrinsic parameters).
Finish the calibration job half-done by your friend by computing the singular value decomposition. The matrix is huge, so you
should be able to find a zip file in the modules section with this matrix in the file svd prob.mat which is a Matlab file with the
matrix variable Q. The zero singular value right vector should be converted into a 3 × 4 matrix that encodes for the rotation and
translation of world coordinates to camera coordinates (after correction of the scale). Recall that the matrix is built up across the
rows, then down the columns. When you finish building it, you will have the matrix
R
T −R
T T
.
Compute the transpose of the left-most 3 × 3 sub-matrix to get R, and solve also for T using the last column (relative to world
frame). What is the camera position and orientation?
Given that you know the intrinsic parameters of the camera to be
Ψ =
400 0 320
0 −400 240
0 0 1
2
(the negative is due to using Matlab matrix image coordinates), the world coordinate q
W = (6.6130, 4.2480, 3.7660)T most likely
projects to which coordinate (up to quantization error)
(a) ?
473
109 ?
(b) ?
353
344 ?
(c) ?
237
91 ?
(d) ?
507
268 ?
.
Problem 4. (15 pts) Get together as a group and specify what the challenge project will be. Before meeting, first try to define
what the topic means to you and maybe even described what you think you group will do. Then discuss and try to commit to a
project. Please turn in the following:
(a) What the project will be. Project a description of the capabilities of the system and what kind of imagery or video will be
needed. The description should be in english, plus there should be a separate description that says something to the effect:
“Given as input XXX, the system will provide as output YYY.”
(b) If possible describe some of the basic components of the framework. What this means is that you project will involve getting
several image processing and computer vision algorithms to run in a particular order. See if you can identify these steps.
(c) Also try to identify what would be reasonable to have done by the week before spring break, and then by the week before
finals week.
(d) If any specialized equipment is needed, please state what. I can’t promise to get it to you, but whatever my lab has that might
be of utility can be made available to the groups.
Each of you will have a contact graduate student who will try to help you out if you get stuck on things. I am trying to match based
on experience and expertise, but it may not be a precise fit. Make sure to confirm with your contact graduate student some of your
thoughts on the above answers. We are aiming for reasonable in a semester kind of thing.
3
Singular Value Decomposition and Camera Matrix Estimation. Six known coordinate pairs of q and r lead to a system
of equations that satisfies,
Q ~m = 0,
where Q is a 12 × 12 matrix. Now, this problem does not look like what you are used to because of the 0 on the right hand side.
Certainly, one answer is ~m = 0, but it can’t be the right answer since the zero matrix transformation all points to zero. What
is happening is that the matrix Q has a non-trivial null space. There exists another vector ~m∗
, besides the zero vector, such that
multiplication of Q by this ~m∗
gives the zero vector. Another way of putting it, is that Q is not full rank. It is in fact rank deficient
by 1 (Q has a rank of 11 instead of 12).
We would like to find this special vector ~m∗
to figure out what the rotation and translation are for a given camera placement.
Recall from your linear algebra class that a matrix can be broken up into eigenvectors and eigenvalues by solving for:
Ax = λx.
Combining the eigenvectors into a matrix V and the eigenvalues into a diagonal matrix Σ, the following equation holds
A = V ΣV
T
.
Well, turns out that Q has one eigenvalue that should be 0. If we can find this eigenvalue, then its eigenvector gives us the special
vector ~m∗
. Unfortunately, eigenvalues are sensitive to noise and become complex quite easily, so solving for eigenvalues and
eigenvectors is not the preferred way to do it.
There is another decomposition that is very similar to the eigenvalue/eigenvector decomposition. It is called the singular value
decomposition and is guaranteed to return positive, real eigenvalues. The singular value decomposition returns matrices U, S, and
V such that
A = USV T
. (2)
The matrix S is diagonal and is equal to the (complex) magnitude squared of the eigenvalues in decreasing magnitude. So, the
bottom right corner of the matrix S should be zero if we compute the SVD of Q. Furthermore, the last column of the right vector
V is the solution vector. Matlab has a function called svd that performs this decomposition. Once you find the solution, then the
last column can be used to get the vector ~m. Of course, the solution is only known up to scale.
In general, the SVD will often give a solution that is known only up to scale. Additional information must be used to establish
the proper scale value, which is then multiplied (or divided) to get the correctly scaled matrix solution.
The Direct Linear Transform and Projection Matrix Estimation. Recall from the reading, that some people prefer to
write the projection equations
~r ∼ Ψ
R
C
W T
C
W
q
W
as
~r ∼ MqW
and basically lump everything into one big matrix of intrinsic and extrinsic parameters. The Direct Linear Transform (DLT) utilizes
the cross-product trick,
~r ×
?
MqW
?
= 0,
as a means to solve for the unknown matrix M. Since this equation has only two independent rows, it provides two independent
equations. Since the projection matrix has twelve values in it, then six world to image coorespondences are needed. Having six
(r, qW ) pairings leads to twelve equations and twelve unknowns. Some people use more points in order to improve on the error due
to quantization effects (in effect performing a least-square fit via the SVD). Since the reading did not go over the full derivation, I
will provide it here plus a couple tips on how to implement it in Matlab.
Moving on, remember that the cross product, being a linear operation, can be written as a matrix product instead
rˆ
?
MqW
?
= 0, where rˆ =
0 −r
3
r
2
r
3
0 −r
1
−r
2
r
1
0
.
Recall that we are treating the image coordinate r to be the ray ~r by adding a third coordinate value of 1. Since the above equation
is linear in the elements of M, the elements of M can be pulled out as though the were the vector *m. The DLT reading doesn’t
cover it, so we’ll go over it here a little bit. Manipulating the equation to expose the rows of the matrix and also open up rˆ,
0 −r
3
r
2
r
3
0 −r
1
−r
2
r
1
0
M1
q
W
M2
q
W
M3
q
W
= 0,
4
where Mi
is the i
th row of M. Since the right element is a vector, each row is a scalar. The cool thing about scalars is that the
transpose of a scalar is the same value, so the above equation is equivalent to
0 −r
3
r
2
r
3
0 −r
1
−r
2
r
1
0
(q
W )
T
(M1
)
T
(q
W )
T
(M2
)
T
(q
W )
T
(M3
)
T
= 0.
Ha-ha! Now, we can factor out the pieces of M,
0 −r
3
r
2
r
3
0 −r
1
−r
2
r
1
0
(q
W )
T
0 0
0 (q
W )
T
0
0 0 (q
W )
T
(M1
)
T
(M2
)
T
(M3
)
T
= 0,
where the middle element is a 3x12 matrix (the 0 terms are really row vectors consisting of three 0 values). Let’s defined the
vector *m to consist of the first row elements, the second row elements, and the third row elements horizontally concatenated then
transposed (to give a 12x1 vector). That means we have,
0 −r
3
r
2
r
3
0 −r
1
−r
2
r
1
0
(q
W )
T
0 0
0 (q
W )
T
0
0 0 (q
W )
T
*m = 0,
which is totally solvable using the SVD approach once we build up a 12x12 matrix (currently we have a 3x12 with a dependent
row). With a little manipulation, it is possible to show that a simplified form of the equation is
0 −r
3
(q
W )
T
r
2
(q
W )
T
r
3
(q
W )
T
0 −r
1
(q
W )
T
−r
2
(q
W )
T
r
1
(q
W )
T
0
*m = 0,
and we’ve recovered the equation from the DLT reading. Note that the 0 entries are row vectors with four zero values. The matrix
is a 3x12 matrix as it should be. Define Take2(~r, qW ) to be the function that takes the top two rows from the above matrix given
the image ray ~r and the world coordinate q
W . Given the six (~r, qW ) pairings, form the master matrix equation:
A ~m =
Take2(~r1, qW
1 )
Take2(~r2, qW
2 )
Take2(~r3, qW
3 )
Take2(~r4, qW
4 )
Take2(~r5, qW
5 )
Take2(~r6, qW
6 )
~m = 0
Performing SVD on the master matrix A will give you the proper answer. Be careful to properly reconstruct the matrix M (don’t
confuse the rows and columns).
”Wait!” you say, ”What about the scale?” Well, it turns out that the scale doesn’t matter because the answer will give you a ray.
If you have a new (seventh) world point q
W
7 , then using the computed projection matrix M, gives
~r7 ∼ MqW
7 ,
but with a ray whose last entry is not unit valued. To get the image coordinates requires dividing by the last row in order to make it
unit valued,
*r =
?
~r1
7
~r2
7
?
/~r3
7.
This final division step actually has the effect of cancelling out the unknown scale. Magic (or, that’s the beauty of rays!).
Matlab implementation note: One trick to creating the Take2 function is to use an inline function in Matlab. For this DLT
problem, it would be
Take2 = @(r, q) [ 0 0 0 0 , -r(3)*(q’) , r(2)*(q’) ; ...
r(3)*(q’) , 0 0 0 0 , -r(1)*(q’) ];
Then, it is literally possible to just do:
A = [ Take2( r1, q1 ) ; Take2( r2, q2 ) ; Take2( r3, q3 ) ; ...
Take2( r4, q4 ) ; Take2( r5, q5 ) ; Take2( r6, q6 ) ];
and Matlab will build the 12x12 matrix for you. Remember that r should be in ray-homogeneous coordinates (a 3x1) and q must
be in homogeneous coordinates (a 4x1).
5
Solving for Extrinsic Parameters Given Intrinsic Parameters. If we know the intrinsic parameters of a camera, then
the method derived in class can be used to solve for the extrinsic variables only (a modified version of the DLT can also be used,
but won’t be covered here). Let’s work this out a little bit. Recall that
r
1
r
2
1
∼ Ψ
R
T −R
T T
C
W
q
W .
where R and T are really with respect to the world frame, i.e., R
W
C and T
W
W C (yeah, it’s kinda funny but that’s what it is with
respect to the coordinate we have). Well, if we invert by Ψ, which is assumed to be known, then
Ψ
−1
r
1
r
2
1
∼
R
T −R
T T
C
W
q
W ,
Define the transformed ~r to be given by the vector ~z, then we have
~z ∼
R
T −R
T T
q
W = DqW , where ~z = Ψ−1
r
1
r
2
1
. (3)
Following the derivation from class, the projection similarity equations above lead to the follow set of equations for each world
point plus image coordinate pairing,
?
q
1
q
2
q
3
1 0 0 0 0 −z
1
q
1 −z
1
q
2 −z
1
q
3 −z
1
0 0 0 0 q
1
q
2
q
3
1 −z
2
q
1 −z
2
q
2 −z
2
q
3 −z
2
?
=
?
0
0
?
. (4)
With six such q and r pairs, you can solve for the unknown 3 × 4 matrix D. That’s how your friend created the large matrix.
Now, there is the usual problem with using the SVD to generate the matrix. The length of the vector ~m is not possible to solve
for without additional information. However, we do have one solution at our disposal. The left-most 3 × 3 matrix of the resulting
solution matrix is a rotation matrix (transposed). Thus it must have a determinant of 1. If the value is not 1, then that is the scale
factor we must apply to the matrix. In math talk, we know that
D =
R
T −R
T T
, (5)
where the vector ~m∗
can be used to build D. Further, we know that det(R
T
) = 1. If the determinant is not equal to 1, then we
must scale the solution to make equal to 1. Dividing by the cube root of det(D(1 : 3, 1 : 3)) (in Matlab speak) will rescale the
matrix to be what it should be. The n
th-root in Matlab is logically called nthroot. This then gives the real D matrix, or the best
you can compute given what you have.
In general, the SVD will often give a solution that is known only up to scale. Additional information must be used to establish
the proper scale value, which is then multiplied (or divided) to get the correctly scaled matrix solution.
6