$30
HW2
1 CSE 252A Computer Vision I - Homework 2
1.1 Instructions
• All solutions must be written in this notebook
• Programming aspects of this assignment must be completed using Python in this notebook.
• If you want to modify the skeleton code, you can do so. This has been provided just to
provide you with a framework for the solution.
• You may use python packages for basic linear algebra (you can use numpy or scipy for basic
operations), but you may not use packages that directly solve the problem.
• If you are unsure about using a specific package or function, then ask the instructor and
teaching assistants for clarification.
• You must submit this notebook exported as a pdf. You must also submit this notebook as
.ipynb file.
• You must submit both files (.pdf and .ipynb) on Gradescope. You must mark each problem
on Gradescope in the pdf.
• It is highly recommended that you begin working on this assignment early.
• Late policy - Assignments submitted late will receive a 15% grade reduction for each 12
hours late (i.e., 30% per day). Assignments will not be accepted 72 hours after the due
date. If you require an extension (for personal reasons only) to a due date, you must
request one as far in advance as possible. Extensions requested close to or after the due
date will only be granted for clear emergencies or clearly unforeseeable circumstances.
1.2 Problem 1: Perspective Projection and Homogenous Coordinates [10 pts]
1.2.1 Part 1 [5 pts]
Consider a perspective projection where a point
P = [x y z]
T
1
is projected onto an image plane Π′
represented by k = f
′ 0 as shown in the following figure.
The first second and third coordinate axes are denoted by i, j, k respectively.
Consider the projection of two rays in the world coordinate system
Q1 = [7 -3 1] + t[8 2 4]
Q2 = [2 -5 9] + t[8 2 4]
where −∞ ≤ t ≤ −1.
Calculate the coordinates of the endpoints of the projection of the rays onto the image plane.
Identify the vanishing point based on the coordinates.
1.2.2 Part 2 [3 pts]
Prove that all parallel lines have the same vanishing point.
1.2.3 Part 3 [2 pts]
Show that the use of homogenous coordinates can convert an affine transformation into that of a
linear one. Recall that an affine transformation of any vector x is described by Ax + b.
1.3 Problem 2: Image Formation and Rigid Body Transformations [10 points]
In this problem we will practice rigid body transformations and image formations through the
projective camera model. The goal will be to photograph the following four points
AP1 = [-1 -0.5 2]
T
,
AP2 = [1 -0.5 2]
T
,
AP3 = [1 0.5 2]
T
,
AP4 = [-1 0.5 2]
T
To do this we will need two matrices. Recall, first, the following formula for rigid body transformation
BP =
B
AR
AP +
BOA
2
Where BP is the point coordinate in the target (B) coordinate system. AP is the point coordinate in
the source (A) coordinate system. B
AR is the rotation matrix from A to B, and BOA is the origin of
the coordinate system A expressed in B coordinates.
The rotation and translation can be combined into a single 4 × 4 extrinsic parameter matrix,
Pe
, so that BP = Pe
·
AP.
Once transformed, the points can be photographed using the intrinsic camera matrix, Pi which
is a 3 × 4 matrix.
Once these are found, the image of a point, AP, can be calculated as Pi
· Pe
·
AP.
We will consider four different settings of focal length, viewing angles and camera positions
below. For each of these calculate:
a) Extrinsic transformation matrix,
b) Intrinsic camera matrix under the perspective camera assumption.
c) Calculate the image of the four vertices and plot using the supplied functions
Your output should look something like the following image (Your output values might not
match, this is just an example)
Sample Plots
1. [No rigid body transformation]. Focal length = 1. The optical axis of the camera is aligned
with the z-axis.
2. [Translation]. BOA = [0 0 1]
T
. Focal length = 1. The optical axis of the camera is aligned with
the z-axis.
3. [Translation and Rotation]. Focal length = 1. B
AR encodes a 30 degrees around the z-axis and
then 60 degrees around the y-axis. BOA = [0 0 1]
T
.
4. [Translation and Rotation, long distance]. Focal length = 5. B
AR encodes a 30 degrees around
the z-axis and then 60 degrees around the y-axis. BOA = [0 0 13]
T
.
You can refer the Richard Szeliski starting page 36 for image formation and the extrinsic matrix.
Intrinsic matrix calculation for perspective camera models was covered in class and
can be referred in slide 2
https://cseweb.ucsd.edu/classes/fa19/cse252A-a/lec2.pdf
You can also refer lecture 3 of the previous year’s course as well for further information
3
if you wish!
http://cseweb.ucsd.edu/classes/fa18/cse252A-a/lec3.pdf
We will not use a full intrinsic camera matrix (e.g. that maps centimeters to pixels, and defines
the coordinates of the center of the image), but only parameterize this with f , the focal length. In
other words: the only parameter in the intrinsic camera matrix under the perspective assumption
is f .
[ ]: import numpy as np
import matplotlib.pyplot as plt
import math
# convert points from euclidian to homogeneous
def to_homog(points): #here always remember that points is a 3x4 matrix
# write your code here
# convert points from homogeneous to euclidian
def from_homog(points_homog):
# write your code here
# project 3D euclidian points to 2D euclidian
def project_points(P_int, P_ext, pts):
# write your code here
return from_homog(pts_final)
[ ]: # Change the three matrices for the four cases as described in the problem
# in the four camera functions geiven below. Make sure that we can see the␣
,→formula
# (if one exists) being used to fill in the matrices. Feel free to document␣
,→with
# comments any thing you feel the need to explain.
def camera1():
# write your code here
return P_int_proj, P_ext
def camera2():
# write your code here
return P_int_proj, P_ext
4
def camera3():
# write your code here
return P_int_proj, P_ext
def camera4():
# write your code here
return P_int_proj, P_ext
# Use the following code to display your outputs
# You are free to change the axis parameters to better
# display your quadrilateral but do not remove any annotations
def plot_points(points, title='', style='.-r', axis=[]):
inds = list(range(points.shape[1]))+[0]
plt.plot(points[0,inds], points[1,inds],style)
for i in range(len(points[0,inds])):
plt.annotate(str("{0:.3f}".format(points[0,inds][i]))+","+str("{0:.3f}".
,→format(points[1,inds][i])),(points[0,inds][i], points[1,inds][i]))
if title:
plt.title(title)
if axis:
plt.axis(axis)
plt.tight_layout()
def main():
point1 = np.array([[-1,-.5,2]]).T
point2 = np.array([[1,-.5,2]]).T
point3 = np.array([[1,.5,2]]).T
point4 = np.array([[-1,.5,2]]).T
points = np.hstack((point1,point2,point3,point4))
for i, camera in enumerate([camera1, camera2, camera3, camera4]):
P_int_proj, P_ext = camera()
plt.subplot(2, 2, i+1)
plot_points(project_points(P_int_proj, P_ext, points), title='Camera %d␣
,→Projective'%(i+1), axis=[-0.6,2.5,-0.75,0.75])
plt.show()
main()
5
1.4 Problem 3: Homography [12 pts]
Consider a vision application in which components of the scene are replaced by components from
another image scene.
In this problem, we will implement partial functionality of a smartphone camera scanning
application (Example: CamScanner) that, in case you’ve never used before, takes pictures of documents and transforms it by warping and aligning to give an image similar to one which would’ve
been obtained through using a scanner.
The transformation can be visualized by imagining the use of two cameras forming an image
of a scene with a document. The scene would be the document you’re trying to scan placed
on a table and one of the cameras would be your smart phone camera, forming the image that
you’ll be uploading and using in this assignment. There can also be an ideally placed camera,
oriented in the world in such a way that the image it forms of the scene has the document perfectly
algined. While it is unlikely you can hold your phone still enough to get such an image, we can
use homography to transform the image you take into the image that the ideally placed camera
would have taken.
This digital replacement is accomplished by a set of corresponding points for the document in
both the source (your picture) and target (the ideal) images. The task then consists of mapping
the points from the source to their respective points in the target image. In the most general case,
there would be no constraints on the scene geometry, making the problem quite hard to solve.
If, however, the scene can be approximated by a plane in 3D, a solution can be formulated much
more easily even without the knowledge of camera calibration parameters.
To solve this section of the homework, you will begin by understanding the transformation
that maps one image onto another in the planar scene case. Then you will write a program that
implements this transformation and use it to warp some document into a well aligned document
(See the given example to understand what we mean by well aligned).
To begin with, we consider the projection of planes in images. imagine two cameras C1 and
C2 looking at a plane π in the world. Consider a point P on the plane π and its projection p =
[u1, v1, 1]
T
in the image 1 and q = [u2, v2, 1]
T
in image 2.
There exists a unique, upto scale, 3 × 3 matrix H such that, for any point P:
q ≈ H p
Here ≈ denotes equality in homogeneous coordinates, meaning that the left and right hand sides
are proportional. Note that H only depends on the plane and the projection matrices of the two
cameras.
The interesting thing about this result is that by using H we can compute the image of P that
would be seen in the camera with center C2 from the image of the point in the camera with center
at C1, without knowing the three dimensional location. Such an H is a projective transformation
of the plane, called a homography.
In this problem, complete the code for computeH and warp functions that can be used in the
skeletal code that follows.
There are three warp functions to implement in this assignment, example ouputs of which are
shown below. In warp1, you will create a homography from points in your image to the target
image (Mapping source points to target points). In warp2, the inverse of this process will be done.
In warp3, you will create a homography between a given image and your image, replacing your
document with the given image.
6
1.
2.
3.
7
4. In the context of this problem, the source image refers to the image of a document you take
that needs to be replaced into the target.
5. The target image can start out as an empty matrix that you fill out using your code.
6. You will have to implement the computeH function that computes a homography. It takes
in exactly four point correspondences between the source image and target image in homogeneous coordinates respectively and returns a 3 × 3 homography matrix.
7. You will also have to implement the three warp functions in the skeleton code given and
plot the resultant image pairs. For plotting, make sure that the target image is not smaller
than the source image.
Note: We have provided test code to check if your implementation for computeH is correct.
All the code to plot the results needed is also provided along with the code to read in the images
and other data required for this problem. Please try not to modify that code.
You may find following python built-ins helpful: numpy.linalg.svd, numpy.meshgrid
[ ]: import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
# load image to be used - resize to make sure it's not too large
# You can use the given image as well
# A large image will make testing you code take longer; once you're satisfied␣
,→with your result,
# you can, if you wish to, make the image larger (or till your computer memory␣
,→allows you to)
source_image = np.array(Image.open("photo.jpg"))/255
# display images
plt.imshow(source_image)
# Align the polygon such that the corners align with the document in your␣
,→picture
# This polygon doesn't need to overlap with the edges perfectly, an␣
,→approximation is fine
# The order of points is clockwise, starting from bottom left.
x_coords = [0,0,0,0]
y_coords = [0,0,0,0]
# Plot points from the previous problem is used to draw over your image
# Note that your coordinates will change once you resize your image again
source_points = np.vstack((x_coords, y_coords))
plot_points(source_points)
plt.show()
8
print (source_image.shape)
[ ]: def computeH(source_points, target_points):
# returns the 3x3 homography matrix such that:
# np.matmul(H, source_points) = target_points
# where source_points and target_points are expected to be in homogeneous
# make sure points are 3D homogeneous
assert source_points.shape[0]==3 and target_points.shape[0]==3
#Your code goes here
H_mtx = np.zeros((3,3)) #Fill in the H_mtx with appropriate values.
return H_mtx
#######################################################
# test code. Do not modify
#######################################################
def test_computeH():
source_points = np.array([[0,0.5],[1,0.5],[1,1.5],[0,1.5]]).T
target_points = np.array([[0,0],[1,0],[2,1],[-1,1]]).T
H = computeH(to_homog(source_points), to_homog(target_points))
mapped_points = from_homog(np.matmul(H,to_homog(source_points)))
print (from_homog(np.matmul(H,to_homog(source_points[:,1].reshape(2,1)))))
plot_points(source_points,style='.-k')
plot_points(target_points,style='*-b')
plot_points(mapped_points,style='.:r')
plt.show()
print('The red and blue quadrilaterals should overlap if ComputeH is␣
,→implemented correctly.')
test_computeH()
[ ]: def warp(source_img, source_points, target_size):
# Create a target image and select target points to create a homography␣
,→from source image to target image,
# in other words map all source points to target points and then create
# a warped version of the image based on the homography by filling in the␣
,→target image.
# Make sure the new image (of size target_size) has the same number of␣
,→color channels as source image
assert target_size[2]==source_img.shape[2]
#Your code goes here
return target_img
# Use the code below to plot your result
result = warp(source_image, source_points, (0,0,0)) #Choose appropriate target␣
,→size
9
plt.subplot(1, 2, 1)
plt.imshow(source_image)
plt.subplot(1, 2, 2)
plt.imsave("myop.png",result)
plt.imshow(result)
plt.show()
The output of warp1 of your code probably has some striations or noise. The larger you make
your target image, the less it will resemble the document in the source image. Why is this happening?
To fix this, implement warp2, by creating an inverse homography matrix and fill in the target
image.
[ ]: def warp2(source_img, source_points, target_size):
# Create a target image and select target points to create a homography␣
,→from target image to source image,
# in other words map each target point to a source point, and then create a␣
,→warped version
# of the image based on the homography by filling in the target image.
# Make sure the new image (of size target_size) has the same number of␣
,→color channels as source image
#Your code goes here
return target_img
# Use the code below to plot your result
result = warp2(source_image, source_points, (0,0,0)) #Choose appropriate size
plt.subplot(1, 2, 1)
plt.imshow(source_image)
plt.subplot(1, 2, 2)
plt.imshow(result)
plt.imsave("warp2.png",result)
plt.show()
Try playing around with the size of your target image in warp1 versus in warp2, additionally
you can also implement nearest pixel interpolation or bi-linear interpolations and see if that makes
a difference in your output.
In warp3, you’ll be replacing the document in your image with a provided image. Read in
"ucsd_logo.png" as the source image, keeping your document as the target.
[ ]: # Load the given UCSD logo image
def warp3(target_image, target_points, source_image):
#Your code goes here
return target_image
# Use the code below to plot your result
result1 = warp3(target_image, target_points, source_image)
10
plt.subplot(1, 2, 1)
plt.imshow(source_image2)
plt.subplot(1, 2, 2)
plt.imshow(result1)
plt.imsave("warp3.png",result1)
plt.show()
1.5 Problem 4: Surface Rendering [18 pts]
In this portion of the assignment we will be exploring different methods of approximating local
illumination of objects in a scene. This last section of the homework will be an exercise in rendering
surfaces. Here, you need use the surface normals and the masks from the provided pickle files,
with various light sources, different materials, and using a number of illumination models. For the
sake of simplicity, multiple reflections of light rays, and occlusion of light rays due to object/scene
can be ignored.
1.5.1 Data
The surface normals and masks are to be loaded from the respective pickle files. For comparison,
You should display the rendering results for both normals calculated from the original image and
the diffuse components. There are 2 images that we will be playing with namely one of a sphere
and the other of a pear.
Assume that the albedo map is uniform.
1.5.2 Lambertian Illumination
One of the simplest models available to render 3D objections with illumination is the Lambertian
model. This model finds the apparent brightness to an observer using the direction of the light
source L and the normal vector on the surface of the object N. The brightness intensity at a given
point on an object’s surface, Id, with a single light source is found using the following relationship:
Id = L · N(IlC)
where, C and Il are the the color and intensity of the light source respectively.
1.5.3 Phong Illumination
One major drawback of Lambertian illumination is that it only considers the diffuse light in its
calculation of brightness intensity. One other major component to illumination rendering is the
specular component. The specular reflectance is the component of light that is reflected in a single
direction, as opposed to all directions, which is the case in diffuse reflectance. One of the most
used models to compute surface brightness with specular components is the Phong illumination
model. This model combines ambient lighting, diffused reflectance as well as specular reflectance
to find the brightness on a surface. Phong shading also considers the material in the scene which
is characterized by four values: the ambient reflection constant (ka), the diffuse reflection constant
(kd
), the specular reflection constant (ks) and α the Phong constant, which is the ‘shininess’ of an
object. Furthermore, since the specular component produces ‘rays’, only some of which would be
observed by a single observer, the observer’s viewing direction (V) must also be known. For some
11
scene with known material parameters with M light sources the light intensity Iphong on a surface
with normal vector N seen from viewing direction V can be computed by:
Iphong = kaIa + ∑
m∈M
{kd(Lm · N)Im,d + ks(Rm · V)
α
Im,s} ,
Rm = 2N(Lm · N) − Lm,
where Ia, is the color and intensity of the ambient lighting, Im,d and Im,s are the color values for
the diffuse and specular light of the mth light source.
1.5.4 Rendering
Please complete the following:
1. Write the function lambertian() that calculates the Lambertian light intensity given the
light direction L with color and intensity C and Il = 1, and normal vector N. Then use this
function in a program that calculates and displays the specular sphere and the pear using
each of the two lighting sources found in Table 1. Note: You do not need to worry about material
coefficients in this model.
2. Write the function phong() that calculates the Phong light intensity given the material constants (ka, kd
, ks
, α), V = (0, 0, 1)
⊤, N and some number of M light sources. Then use this
function in a program that calculates and displays the specular sphere and the pear using
each of the sets of coefficients found in Table 2 with each light source individually, and both
light sources combined.
Hint: To avoid artifacts due to shadows, ensure that any negative intensities found are set to zero.
Table 1: Light Sources
m Location Color (RGB)
1 (−1
3
,
1
3
,
1
3
)
⊤ (1, 1, 1)
2 (1, 0, 0)
⊤ (1, .5, .5)
Table 2: Material Coefficients
Mat. ka kd ks α
1 0 0.1 0.75 5
2 0 0.5 0.1 5
3 0 0.5 0.5 10
1.5.5 Part 1. Loading pickle files and plotting the normals [4 pts] (Sphere - 2pts, Pear - 2pts)
In this first part, you are required to work with 2 images, one of a sphere and the other one of a
pear. The pickle file normals.pickle is a list consisting of 4 numpy matrices which are
1) Normal Vectors for the sphere with specularities removed (Diffuse component)
2) Normal Vector for the sphere
3) Normal Vectors for the pear with specularities removed (Diffuse component)
12
4) Normal vectors for the pear
Please load the normals and plot them using the function plot_normals which is provided.
[ ]: def plot_normals(diffuse_normals, original_normals):
# Stride in the plot, you may want to adjust it to different images
stride = 5
normalss = diffuse_normals
normalss1 = original_normals
print("Normals:")
print("Diffuse")
# showing normals as three separate channels
figure = plt.figure()
ax1 = figure.add_subplot(131)
ax1.imshow(normalss[..., 0])
ax2 = figure.add_subplot(132)
ax2.imshow(normalss[..., 1])
ax3 = figure.add_subplot(133)
ax3.imshow(normalss[..., 2])
plt.show()
print("Original")
figure = plt.figure()
ax1 = figure.add_subplot(131)
ax1.imshow(normalss1[..., 0])
ax2 = figure.add_subplot(132)
ax2.imshow(normalss1[..., 1])
ax3 = figure.add_subplot(133)
ax3.imshow(normalss1[..., 2])
plt.show()
[ ]: #Plot the normals for the sphere and pear for both the normal and diffuse␣
,→components.
#1 : Load the different normals
# LOAD HERE
#2 : Plot the normals using plot_normals
#What do you observe? What are the differences between the diffuse component␣
,→and the original images shown?
#PLOT HERE
1.5.6 Part 2. Lambertian model [6 pts]
Fill in your implementation for the rendered image using the lambertian model.
[ ]: def normalize(img):
assert img.shape[2] == 3
maxi = img.max()
13
mini = img.min()
return (img - mini)/(maxi-mini)
[ ]: def lambertian(normals, lights, color, intensity, mask):
'''Your implementation'''
image = np.ones((normals.shape[0], normals.shape[1], 3))
return (image)
Plot the rendered results for both the sphere and the pear for both the original and the diffuse
components. Remember to first load the masks from the masks.pkl file. The masks.pkl file is a list
consisting of 2 numpy arrays1)Mask for the sphere
2)Mask for the pear
Remember to plot the normalized image using the function normalize which is provided.
[ ]: # Load the masks for the sphere and pear
# LOAD HERE
# Output the rendering results for Pear
dirn1 = np.array([[1.0],[0],[0]])
color1 = np.array([[1],[.5],[.5]])
dirn2 = np.array([[-1.0/3],[1.0/3],[1.0/3]])
color2 = np.array([[1],[1],[1]])
#Display the rendering results for pear for both diffuse and for both the light␣
,→sources
[ ]: # Output the rendering results for Sphere
dirn1 = np.array([[1.0],[0],[0]])
color1 = np.array([[1],[.5],[.5]])
dirn2 = np.array([[-1.0/3],[1.0/3],[1.0/3]])
color2 = np.array([[1],[1],[1]])
#Display the rendering results for sphere for both diffuse and for both the␣
,→light sources
1.5.7 Part 3. Phong model [8 pts]
Please fill in your implementation for the Phong model below.
[ ]: def phong(normals, lights, color, material, view, mask):
'''Your implementation'''
return (image)
With the function completed, plot the rendering results for the sphere and pear (both diffuse
and original compnents) for all the materials and light sources and also with the combination of
both the light sources.
[ ]: # Output the rendering results for sphere
view = np.array([[0],[0],[1]])
material = np.array([[0.1,0.75,5],[0.5,0.1,5],[0.5,0.5,10]])
lightcol1 = np.array([[-1.0/3,1],[1.0/3,1],[1.0/3,1]])
14
lightcol2 = np.array([[1,1],[0,0.5],[0,0.5]])
#Display rendered results for sphere for all materials and light sources and␣
,→combination of light sources
[ ]: # Output the rendering results for the pear.
view = np.array([[0],[0],[1]])
material = np.array([[0.1,0.75,5],[0.5,0.1,5],[0.5,0.5,10]])
lightcol1 = np.array([[-1.0/3,1],[1.0/3,1],[1.0/3,1]])
lightcol2 = np.array([[1,1],[0,0.5],[0,0.5]])
#Display rendered results for pear for all materials and light sources and␣
,→combination of light sources
15