# 程序代写代做代考 Hive matlab scheme algorithm Microsoft Word – 4212 takehome q1.doc

Microsoft Word – 4212 takehome q1.doc

Open Book Exam for EE4212 (Due: 27th Feb, 2017)

Your name and Matric Number:

Your pledge: I have not given or received aid from anyone else in solving the problems.

Your Signature:

Instructions:

1. Your report should be neatly written, typed up or a mix of both. You can print using recycled

papers, or print on both sides! Also submit your codes to the GA via the IVLE workbin->Student

Submission->CA1. You can send executables and/or just the source codes but provide a readme

file on how to compile and run it.

2. Each question is marked with 0 to 3 asterisks *, with more asterisk indicating increasing level of

difficulty. A question with 3 *’s means that it is a challenge question; if you can solve such

challenge questions, you will get A+ and you should seriously consider becoming a vision

researcher!

3. In the following questions, you can assume that the camera is calibrated. Do not bring in the

intrinsic parameters. That is, you can treat all the image quantities given or asked for in the

questions as quantities in terms of the camera reference frame, with the implicit assumption that

you can always convert them to pixel units if needs be.

1. Projection Model; Homogeneous coordinates

(a) * Suppose you have two parallel lines in 3-space, one passing through the point (100,100,1000), the

other through (200,200,1100). The lines are parallel to the vector (1,2,1). The lines are observed by

a unit focal length camera at the origin (i.e. the camera reference frame and the world reference

frame are identical). All coordinates are in camera coordinates. What is their point of intersection in

the image? (Hint: the point of infinity along li in 3-space is given by limλi→∞

⎥

⎥

⎥

⎦

⎤

⎢

⎢

⎢

⎣

⎡

+

⎥

⎥

⎥

⎦

⎤

⎢

⎢

⎢

⎣

⎡

=

c

b

a

Z

Y

X

i

Ci

i

i

λil ,

where

i

i

i

X

Y

Z

⎡ ⎤

⎢ ⎥

⎢ ⎥

⎢ ⎥⎣ ⎦

and

a

b

c

⎡ ⎤

⎢ ⎥

⎢ ⎥

⎢ ⎥⎣ ⎦

are a point on the line and its direction respectively)

(b) Given the 3D coordinates of several corresponding points Pi and Pi’ in two views, you are required

to find the 3D rotation R and translation T that relate the two views (Pi’ = RPi + T). Formulate a

linear least squares algorithm (of the form Ax=b) that ignores the orthogonality constraint associated

with R (that is, it is ok if the solution for R returned by your formulation is not orthogonal). State the

entries of the matrix A, and the vectors x and b.

(c) You are now given two sets of corresponding point clouds Pi and Pi’, in the files pts.txt and

pts_prime.txt respectively. Each row is a 3D coordinate, possibly contaminated with some noise.

Write a Matlab routine to estimate the optimal values of R and T in the least squares sense via SVD,

using the formulation in (b). Compute the determinant of R using the Matlab function det, and

comment on the resulting value.

(d) If the world homogeneous coordinates are (Xw, Yw, Zw, Ww), the image plane homogeneous

coordinates are (u, v, w), and they are related by:

i. find the 3×4 matrix M as a function of α, h, according to the diagram below (in the left diagram,

the axis X and Xw are pointing directly out of the paper). Assume that the only intrinsic camera

parameter is the focal length f (given in pixel unit). Use Pc = RPw +t to relate the camera

coordinates Pc and the world coordinates Pw. R is the orientation of the world with respect to the

camera; t is the world origin expressed in the camera frame.

ii. * Find the 3×3 matrix for the linear transformation that maps points on the world plane Zw = d to

the image plane (u, v, w).

Q2 Using Singular Value Decomposition for Image Compression and PCA.

Many thousands of outdoor cameras are currently connected to the Internet. They can be used to

measure plant growth (e.g. in response to recent warming trends), survey animal populations (e.g. in

national parks), monitor surf conditions, and security, etc. You can see some examples from the

following websites:

Santa Catalina Mountains, Arizona – Dec 5 2006 (http://www.cs.arizona.edu/camera);

Yosemite: Half-dome from Glacier Point – Dec 19 2006 (http://www.halfdome.net/);

Tokyo Riverside Skyline – Jan 08 2007 (http://tokyosky.to/ ).

University of Washington AMOS archive: http://www.cse.wustl.edu/amos

In this question, you are provided with a 150-frame time-lapse video of a city scene taken between 6.30-

7.30pm. Each frame is of the dimension 161×261 pixels. Watch the video, and you can see that clouds

are moving, and planes take off and land from a nearby airport.

a. Using Matlab, load the first image in the video sequence provided with this question and convert it

to an appropriate data type. Just type the following commands in the command window:

I= im2double(imread(‘image001.png’));

b. Do a singular value decomposition using the command ‘svd’:

[U S V]=svd(I);

This will give you the singular values and the singular vectors. The singular values in S have been

sorted in descending order. Plot the singular value spectrum using the command: plot(diag(S),’b.’).

Submit this plot. What do you notice in this plot?

c. Let K=20, Extract the first K singular values and their corresponding vectors in U and V:

K=20;

Sk=S(1:K,1:K);

Uk=U(:,1:K);

Vk=V(:,1:K);

Uk, Vk, Sk contain the compressed version of the image. To see this, form the compressed image:

Imk=Uk*Sk*Vk’; display it by :imshow(Imk).

Print out a copy of this compressed image and submit it.

d. Repeat question c for K=40,60,80. Submit the compressed images for the different values of K.

Compare the 4 compressed images. Briefly describe what you notice.

e. Thus, in image transmission, instead of transmitting the original image you can transmit Uk, Vk, Sk,

which should be much less data than the original. Is it worth transmitting when K=100 (i.e. do you

save any bits in transmission when K=100)? Explain your answer.

f. ** Find an expression that bounds the per-pixel error (the difference between the original image and

the compressed image for a particular pixel (i, j)) in terms of K, the singular values and the elements

in U and V with the largest absolute magnitude (umax, vmax respectively).

g. ** SVD is intimately related to PCA (Principal components analysis). Some of you might have

learned about PCA in the EE3731C course, but it is not a pre-requisite for this question. Basically,

finding the principal components of a matrix X amounts to finding an orthonormal basis that spans the

column space of X (these are the column vectors ui in the matrix U). Here we create the data matrix X by first

vectorizing each of the 150 images into a column vector (by scanning in either row-major order or column-

major order), and then stacking these vectors together into a matrix of size 42021×150. In effect, we have

captured the entire video sequence into the 2D matrix X.

Vectorizing an image.

Before we proceed further, mean subtraction (a.k.a. “mean centering”) to center the data at the origin is

necessary for performing PCA. That is, the images are mean centered by subtracting the mean image vector

from each image vector. This is to ensure that the first principal component u1 really describes the direction of

maximum variance. Again, if you do not have background in PCA, it is ok; just take it as a preprocessing step

(or you can do some independent learning).

Apply SVD to the resultant mean-centered matrix. Take only the first 10 principal components and

reconstruct the image sequence. [NB: you can use the Matlab reshape command to convert a matrix to a

vector and vice versa]. Observe the dynamics in the reconstructed video, comment on what you find, and

explain why. Remember to add back this mean image vector when you are displaying your reconstructed

results.

[Non-mandatory]: You can also explore other possibilities, like investigating what each principal component

means, and their implications for processing and editing of outdoor webcam imagery.

Question 3: Projection Model; Homogeneous coordinates

a. An affine camera is a simplification of the full perspective camera but is more complicated than the

scaled orthographic model. It has a projection relationship given by the following equations:

[x,y]T = A[X,Y,Z]T + b

where A is a 2×3 matrix, and b is a 2×1 vector. If the world point (X,Y,Z) and image point (x,y) are

represented by homogeneous vectors, write down the matrix representing the linear mapping

between their homogeneous coordinates.

* Show that the point at infinity in space (X,Y,Z,0)T is mapped to point of infinity in the image plane.

What does this result imply about the projection of parallel lines in space onto the image plane?

b. ** Given an image of a scene containing a cube as shown in the following.

Each vanishing point is the image of a point at infinity of the form (d, 0), where d is a Euclidean

vector with 3 coordinates expressing the direction of a cube edge. Show that the coordinates of a

vanishing point v can be expressed as v = K R d, where K is the intrinsic calibration matrix and R is

the rotation matrix between the camera and world coordinate system. Hence express an edge

direction d as a function of K, R and v. (Hint: Start from the 3×4 projection matrix equation in Ch 1)

** The 3 directions of the cube edges are mutually perpendicular; therefore the dot product between

any two directions is zero. Show that this condition leads to an equation in terms of the vanishing

points and the unknown calibration matrix K. Note that such an equation can be written for each

pair of the 3 vanishing points.

*** (Note that the derived equation above can be used to solve for K, but you are not required to

explicitly propose a scheme for solving this equation. However, doing so will earn you bonus point.)

c. The projection of a scene point in world coordinates to pixel coordinates in an image can be

represented using a camera projection matrix P as follows:

i. * Given a set of lines in a scene that are all parallel to the world X-axis, what is the vanishing

point, (u, v), of these lines in the image? Can you conclude whether an infinite line in 3D space

always yield an infinite line in the 2D image plane?

ii. What is the significance of the image point (represented as a homogeneous 3-vector) given by

the last column ( 14p , 24p , 34p ) of P? That is, which world point gives rise to ( 14p , 24p , 34p )?

iii. ** Consider the 1-dimensional right null-space of P, i.e., the 4-vector C such that PC=0. In this

case, the image point of C is (0, 0, 0) which is not defined. What is the point C which possesses

this property? Explain.

d. The equation of a conics in inhomogeneous coordinates is given by a x2 + b xy + c y2 +d x + e y + f

=0. Homogenize this equation by the replacement x → x1 / x3, y → x2 / x3, and write down the

homogeneous form. Finally, express this homogeneous form in the matrix form xT C x = 0 where C

is symmetric. Write down the elements of the symmetric matrix C.

** If C has the special form C = l mT + m lT , clearly C is symmetric and therefore represents a

conic. Show that the vector x = l × m is the null vector of C. Since a null vector exists, C is a

degenerate conic. Show in this degenerate case, C contains two lines l and m, that is, show that the

points on the lines l and m satisfy xT C x = 0.

e. Given a set of two-dimensional points (x,y) as follows, write a Matlab routine to find the conics best

(in the least squares sense) represented by these points, in terms of the parameters a, b, c, d, e, and f,

as formulated in Question 3d.

3,-8 4,-9 5,-9 6,-9 7,-9 8,-9 9,-9 10,-9 11,-8 12,-8

13,-8 14,-8 15,-7 16,-6 16,-5 17,-4 17,-3 17,-2 18,-1 18,0

18,1 18,2 18,3 18,4 18,5 18,6 18,7 18,8 18,9 17,10

17,11 16,12 16,13 15,14 15,15 14,16 13,17 12,18 11,18 10,19

9,20 8,20 7,20 6,21 5,21 4,21 3,22 2,22 1,22 0,22

-1,21 -2,21 -3,20 -4,19 -5,18 -6,17 -7,16 -7,15 -8,14 -8,13

-8,12 -8,11 -8,10 -8,9 -8,8 -8,7 -8,6 -8,5 -8,4 -7,3

-7,2 -6,1 -5,0 -4,-1 -4,-2 -3,-3 -2,-4 -2,-5 -1,-6 0,-7

1,-7 2,-8

11 12 13 14

21 22 23 24

31 32 33 34

1

w

w

w

X

su p p p p

Y

sv p p p p

Z

s p p p p

⎛ ⎞

⎛ ⎞ ⎡ ⎤ ⎜ ⎟

⎜ ⎟ ⎢ ⎥ ⎜ ⎟=⎜ ⎟ ⎢ ⎥ ⎜ ⎟⎜ ⎟ ⎢ ⎥ ⎜ ⎟⎝ ⎠ ⎣ ⎦

⎝ ⎠

Q4. Estimate Fundamental Matrix F using 8-Point Algorithm [you should be able to attempt this

question by the end of week 4]

4.1. Introduction:

The fundamental matrix F completely describes the epipolar geometry of a pair of images. 8-Point

Algorithm is the simplest method for estimating the fundamental matrix. In this assignment, you will

implement the 8-Point Algorithm to estimate the fundamental matrix given a pair of images. You would

also need to implement the normalization so as to improve the stability (outlined in steps 4.2.2 and 4.2.4

below). Finally, we have also provided a package to help you reconstruct and visualize the depths (see point

#6 in 4.3; this step is optional but will earn you bonus point). Please read this document completely before

starting work. A list of reading materials you may need to solve this problem is given in Appendix I.

4.2. Tasks:

You are given 2 pairs of images (inria1.tif, inria2.tif, frc1.tif, frc2.tif) for this assignment. The steps for this

assignment are:

a. Using one of the given pairs of images, establish n ( 8)n ≥ point correspondences.

b. Normalize the coordinates of the correspondences.

c. Using the normalized coordinates of the correspondences, compute the fundamental matrix.

d. Perform denormalization so as to find the fundamental matrix corresponding to the original data.

Details of each step are given in Section 4.2.1, 4.2.2, 4.2.3 and 4.2.4. Some useful Matlab commands are

given in Appendix II.

4.2.1 Establish Point Correspondences:

In this step, you need to establish point correspondences between the two images. Using Matlab, manually

or automatically mark and record at least 8 pairs of corresponding points (preferably much more than 8 pairs

for robustness!).

Note:

• Your points should not all lie on the same line, but spread out in the image.

• It is usually easier to mark prominent features in the image, e.g. corners of the buildings.

• You may use the Matlab function ‘cpselect’ for this step. It displays the pair of images side by side,

allowing you to click on corresponding points manually. You can also use the zoom feature to mark the

points more accurately. See 4.3 (5) if you want to automate this step.

At the end of this step: you should have marked out n ( 8)n ≥ point correspondences ‘i i↔x x between

the pair of images. Show where these points are by plotting the correspondences in the same symbol and

color in the pair of images. Submit these plots as jpg files.

4.2.2 Normalization of the data

In 8-point algorithm, the stability of the results can be greatly improved by a simple normalization

(translation and scaling) of the coordinates of the correspondences (originally presented in [Hartley-

1997] and also in [Hartley&Zisserman-2000]). For this normalization, you should find a similarity

transformation T , consisting of a translation and scaling, that takes points ix to a new set of points

ˆ ix ( )ˆi iT=x x such that the centroid of the points ˆ ix is the coordinates origin, and their average distance

(Root Mean Squares) from the origin is 2 . Similarly, find a similarity transformation ‘T that takes

points ‘ix to points ˆ ‘ix ( )ˆ ‘ ‘ ‘i iT=x x .

Note: Homogeneous coordinates are used in this step.

At the end of this step: You should have the normalized homogeneous coordinates of the point

correspondences ˆ ˆ ‘i i↔x x , which are to be used in the next step to compute F. The transformation T and

‘T will be used again in the denormalization step.

4.2.3 Computation of Fundamental Matrix

The computation of the fundamental matrix is described in detail in your textbook or in Section 10.1 of

[Hartley&Zisserman-2000]. In summary the steps are as follows:

(1) Find solution for f

From a set of n point correspondences, we obtain a set of linear equations of the form

‘ ‘ ‘ ‘ ‘ ‘

1 1 1 1 1 1 1 1 1 1 1 1

‘ ‘ ‘ ‘ ‘ ‘

1

0

1n n n n n n n n n n n n

x x x y x y x y y y x y

Af f

x x x y x y x y y y x y

⎡ ⎤

⎢ ⎥

= =⎢ ⎥

⎢ ⎥⎣ ⎦

A is an 9n × matrix ( 8)n ≥ and f is the 9-vector made up of the entries of F in row-major order. If

( ) 8rank A ≤ , non-trivial solution for the system above exists, and if ( ) 8rank A = , the solution is unique (up

to scale). The solution for f should be the last column of V in the ( ) TSVD A UDV= .

(2) Enforce singularity constraint

Fundamental Matrix is singular, in fact of rank 2. The matrix F found by the above method will not in

general have rank 2 and steps should be taken to enforce this singularity constraint. One easy way to do this

is: let TF UDV= be the SVD of F , where D is a diagonal matrix ( , , )diag r s t satisfying r s t≥ ≥ . Let

‘ ( , ,0) TF Udiag r s V= . According to the so-called Frobenius norm, ‘F is the closest singular matrix to F .

Thus replace F by ‘F .

Note:

*The data used in this step is the normalized data obtained from step 4.2.2.

At the end of this step: You should’ve the estimated fundamental matrix ‘F . Keep in mind that because

you’ve performed step 4.2.2, this is actually the fundamental matrix corresponding to the normalized data;

thus we will instead denote it as ˆ ‘F .

4.2.4 Denormalization

Since you have performed step 4.2.2, the fundamental matrix ˆ ‘F you obtained from the above corresponds

to the normalized data ˆ ˆ ‘i i↔x x and need to be denormalized. Set ˆ’ ‘

TF T F T= (T and ‘T are obtained in the

normalization step; you may easily prove this yourself). Then, F is the fundamental matrix corresponding

to the original data ‘i i↔x x .

At the end of this step: You should have the estimated fundamental matrix F .

* Note: To visualize the correctness of your estimated F, use the given function displayEpipolarF in

displayEpipolarF.m (This function needs another function in epipoles.m). The function displayEpipolarF

asks to select an image point in image 1 and displays the corresponding epipolar line in image 2. If F is

estimated correctly, the epipolar line should pass through the corresponding point in image 2. Download

displayEpipolarF.m, epipoles.m (contained in zipped matlab files)

4.3 What to submit

(1) Plots showing the corresponding points. Explain briefly why you chose these points.

(2) The code that implements the computation of F Matrix, including the normalization of the data. Submit

a matlab function with the following signature:

function F = estimateF(x1, x2)

where x1 and x2 are each 2 n× matrices with the columns corresponding to the coordinates in the first

and the second image respectively.

(3) Print out the values of the estimated F, and also the results of your verification for a few points (using

displayEpipolarF). You should repeat the whole process for different values of ( 8)n ≥ and for different

sets of corresponding points, or on different pair of images for thorough testing. Don’t despair if the

results showed a lot of variation!

(4) ** Comment on the accuracy of the results obtained in (3). If they are not as accurate as you have hoped,

attempt to offer an explanation.

(5) *** Bonus: Obtain dense correspondences using KLT tracker. Read information about KLT tracker

from the OpenCV libraries and how to run it (Matlab interface routines are available on the preceding

website). You can also read this book in the RBR for more information: Gary Bradski, Adrian Kaehler.

Learning OpenCV: Computer Vision with the OpenCV Library. From the dense correspondence,

attempt to improve the estimation of F.

(6) *** Bonus: use the package (contained in this zipped matlab package (12MB) and see the readme file for

instructions) provided to reconstruct the 3d depths and visualize the 3d point cloud. All the functions for

estimating 3d depths (from a given F) and visualization have been provided for you in this package. You

only need to interface your codes with the package properly. Print out the final plot.

Note that since you are using only fundamental matrix, without any knowledge of the intrinsic

parameters, you can only reconstruct a scene structure that is related to the true scene depths by a

projective transformation (what is known as projective depths). Thus, the reconstructed plot may look

distorted.

Alternatively, use the two building images here (building1.jpg, building2.jpg). If you are unable to

download the building1 and building2 images through the link in the word document itself, you can also

go to the class website at http://courses.nus.edu.sg/course/eleclf/ee4212/ and download through the links

there (in the assignment section.) The intrinsic parameters of the two images are the same and are as

follows: the focal length is 1333.3 pixel unit, while the principal point is at the center of the image. With

these intrinsic parameters, you are able to do a Euclidean reconstruction instead of a projective

reconstruction. There are two things to do. First, convert the image coordinates into metric coordinates.

In this case, your estimateF function will automatically become a function estimating the essential matrix;

second, tell the package provided that you are working on the ‘calibrated’ case by calling the function

computeSMFromW with an additional parameter as follows

anim = computeSMFromW(true, samples, ‘isCalibrated’, true);

Then you will get an Euclidean 3D point cloud up to a scale factor.

NB: The toolbox is supported on Windows and Linux only. If you are a Mac user, you have 2 options:

1. you can try the computers installed with Windows and Matlab in the computer lab in E5.

2. The GA can teach you to install a virtual machine on Mac to run Windows system. This will take hours.

Appendix I. Reading Materials

*[Hartley&Zisserman-2000] R. Hartley, A. Zisserman, Multiple View Geometry in computer vision, Cambridge, 2000 (Section

10.1, 10.2)

* [Hartley-1997] R. Hartley, In Defense of the Eight-Point Algorithm, IEEE-PAMI, Vol.19, No.6, 1997 [describing the

rationale behind the normalization step]

* E. Trucco, A. Verri, Introductory Technique for 3-D Computer Vision, Prentice Hall, 1998 (Section 7.3)

Appendix II. Useful Matlab Command

These are some Matlab commands that might be useful for your work. For more information please type: help

Matlab or refer to the supplementary material given in the course website: zipped matlab files (which contains Basic

Operations (matlab_ops_tutorial.m), Programming (matlab_prog_tutorial.m) | Working with Images (matlab_image_tutorial.m)).

* imread, imwrite to read and write images. JPEG, TIFF, BMP formats are supported.

* imshow to display images.

* plot(x,y,’b*’) to plot points given by the coordinates (x,y). The point plotted is a blue

asterisk’*’. See the help page for more information about colors and markers.

* “hold on” causes subsequent image operations to display results on top of image, instead

of replacing image. Use this to plot the features points onto the image.

* save

* load loads back the variables saved by save.

* cpselect displays 2 images side by side, allowing you to click on corresponding points.

* [x, y]=ginput(n) captures the coordinates of n mouse clicks in the figure. If n>1, each of x and y is a n×1 vector.

* [U S V]= svd(A) computes the svd of matrix A.

* reshape(X,m,n) returns the m×n matrix whose elements are taken columnwise from vector X (i.e. 1st column of the m×n

matrix comes from the first n elements of vector X).

Q5 Motion Perception [you should be able to attempt (a)-(c) by the end of week 5]

Figure. Two squares translating horizontally in the opposite directions.

(a) In the Figure above, the two squares are translating horizontally in the opposite directions as

indicated by the arrows. Indicate the respective perceived motions if you are looking through the

apertures 1, 2, and 3. Explain your answers.

(b) Barber-pole with Occlusion: The figure in this question illustrates various barber-pole configurations,

with occluders placed along either vertical or horizontal sides of the barber-pole. The arrows indicate

the perceived direction of barber-pole motion. Explain why the perceived motion tends to be biased

in the direction orthogonal to the occlusion boundary.

Figure. Barber-pole with occlusion. Arrows indicate the perceived motion. Left: Barber-pole with no

occluders. Middle: Occluders placed on the top and bottom cause the perceived motion to be mostly

vertical. Right: Occluders placed on the left and right sides of the barber-pole bias the perceived

motion toward horizontal.

(c) *Referring to the diagram below, the stimulus consists of two orthogonal bars that move sinusoidally,

90 degree out of phase (Figure a and b). When presented together within an occluding aperture

(Figure c), the bars perceptually cohere and appear to move in a circle as a solid cross. However,

when presented alone (Figure d), they appear to move separately (the horizontal bar translates

vertically and the vertical bar translates horizontally), even though the image motion is unchanged in

Figures c and d. In either stimulus condition, both percepts are legitimate interpretations of the

image motion. Yet a single interpretation is predominantly seen in each case. Why?

(d) * For a purely rotational motion, the equation that relates the optical flow (u, v) to its rotational

parameters (ωx, ωy, ωz) is of the conic form explored in Questions 3d and 3e:

u = ωx xy − ωy (x2 + 1) + ωz y

v = ωx (y2 + 1) − ωy xy − ωz x

Assume you are given enough optical flow measurements at different (x, y) locations, solve (ωx, ωy,

ωz) in the least squares sense by writing down the least squares equation of the form Ax=b, b ≠ 0.

Explain why in this case, the equation is of the non-homogeneous form, whereas that in Question 3e

is of the homogeneous form Ax=0, and explain whether it is appropriate or inappropriate to change

the formulation of this Question 5d to the Ax=0 form.