It Development

Location:

Hong Kong

Posted:

November 21, 2012

Contact this candidate

Resume:

International Journal of Computer Vision **(*), **5 150, 2007

c 2007 Springer Science + Business Media, LLC. Manufactured in the United States.

DOI: 10.1007/s11263-007-0044-1

Image-Based Modeling by Joint Segmentation

LONG QUAN, JINGDONG WANG, PING TAN AND LU YUAN

The Department of Computer Science and Engineering, The Hong Kong University of Science and Technology

****@**.***.**

********@**.***.**

****@**.***.**

******@**.***.**

Received May 10, 2006; Accepted February 14, 2007

First online version published in March, 2007

Abstract. The paper rst traces the image-based modeling back to feature tracking and factorization that have been

developed in the group led by Kanade since the eighties. Both feature tracking and factorization have inspired and

motivated many important algorithms in structure from motion, 3D reconstruction and modeling. We then revisit the

recent quasi-dense approach to structure from motion. The key advantage of the quasi-dense approach is that it not only

delivers the structure from motion in a robust manner for practical modeling purposes, but also it provides a cloud of

suf ciently dense 3D points that allows the objects to be explicitly modeled. To structure the available 3D points and

registered 2D image information, we argue that a joint segmentation of both 3D and 2D is the fundamental stage for the

subsequent modeling. We nally propose a probabilistic framework for the joint segmentation. The optimal solution

to such a joint segmentation is still generally intractable, but approximate solutions are developed in this paper. These

methods are implemented and validated on real data set.

Keywords: structure from motion, image-based modeling, reconstruction, segmentation

The integral is over a window W around the point x. The

1. Introduction

transformation T may be taken to be a displacement in

its simplest form, x T (x) = x + d. Taylor expanding

Modeling by video taping has been advocated at CMU

the image term I (x + d), and differentiating E (d) with

by Kanade since the early eighties. The development has

respect to d for minima, we obtain Hd = e, where e is

been focused on two fundamental achievements by his

the residual error vector x W ( I I )g(weight)d x and g is

group, Moravec-Lucas-Tomasi-Kanade feature detection

the gradient of the image I . The H is the Hessian matrix,

and tracking, and Tomasi-Kanade factorization method

for reconstruction.

H= gT g(weight)d x.

x W

Lucas-Kanade tracking As done in Tomasi and Kanade (1991) and Shi and

Tomasi (1994), the feature point is de ned to be the pixel

at which we could have a reliable solution, i.e. H is not

The original idea of Lucas and Kanade (1981) is to track

or match an image patch I (x ) against a reference image singular at it.

I (x ) in two successive frames with unknown transforma- The Lucas-Kanade tracking equation has several ram-

tion T such that I (T (x)) = I (x). Due to the inherent i cations. First, it leads to one de nition of good feature

points (Tomasi and Kanade, 1991; Shi and Tomasi, 1994),

aperture problem for each individual pixel, it chooses a

of which H is well conditioned. This is per se the same de-

(weighted) integral error to minimize for the unknown

transformation T, tector of point of interest or corner in Harris and Stephens

(1988) that is motivated by improving Moravec s points

of interest (Moravec, 1981), which in turn is from the

E (T ) = I (T (x)) I (x)2 (weight) d x.

solution of discretization of the equation E (T ) with

x W

Quan et al.

136

d = {( 1 0, 1 0)} for self-matching I = I . It dated, but was related to the concept of af ne shape in-

troduced by Koenderink and Van Doorn in 1988. Koen-

is similar to that developed by F rstner (1994) as well.

derink s paper (Koenderink and van Doorn, 1989) has

All detectors of point of interest (F rstner, 1994; Harris

already been in circulation in 1988, and was published

and Stephens, 1988) only differ at computational imple-

in 1991 (Koenderink and van Doorn, 1989). The techni-

mentation of considering the eigensystem of H . Harris

and Stephen used det kTrace2 = 1 2 k ( 1 + 2 )2 . cal report of Tomas-Kanade factorization (Tomasi, 1991)

Tomasi and Kanade suggested min( 1, 2 ) >, and was available in 1991 and its journal version (Tomasi and

F rstner took aslo the scale into account and proposed Kanade, 1992) in 1992. This af ne shape was innovative,

2 = 2 . but limited to only af ne cameras. It is Faugeras paper

(Faugeras, 1992) that gave a de nite answer to what is

Second, it suggests that a dense correspondence

the parallel to an af ne camera for a projective camera.

between frames is ill-posed in nature as H is not

Hartley (1992) independently reached the same re-

well conditioned everywhere. I.e. there are pixels for

sults. Mohr et al. (1992, 1995) followed up by propos-

which H are close to singularity. The impossibility of

ing a numerical scheme, a kind of bundle adjustment

dense correspondence justi es the necessity of the in-

for projective reconstruction for multiple views, us-

troduction of the quasi-dense correspondence (Lhuil-

ing an explicit projective basis de ned by 5 space

lier and Quan, 2005) as a more achievable goal.

points. Interestingly and importantly, it was shown in

Thirdly, it implies that feature point based approach,

Sturm and Triggs (1996) and Triggs (1996) that the orig-

now the mainstream, is by its de nition the in nites-

inal Tomasi-Kanade af ne factorization for af ne cam-

imal expansion d for matching and self-matching only

eras could be extended to projective reconstruction for

good for close frames. This indeed re ects the actual

projective cameras, this is the projective factorization,

practice in the representative systems implemented in

that is the only alternative to a bundle-adjustment type

Pollefeys et al. (1998) and Nister (2001). Though a more

projective reconstruction. It has even an advantage of

general transformation than the translational displace-

straightforward initialization when an iterative scheme is

ment could be introduced into the development of match-

necessary. This projective tour for structure from mo-

ing and self-matching equations without theoretical dif -

tion is important as it tells us an important fact that

culties (Triggs, 2004), it leads often to inconclusive trade-

the matching problem for a rigid scene or object is

off between invariance and rareness of the descriptors

encoded by this uncalibrated projective geometry. And

(Triggs, 2004).

this projective structure could be ef ciently computed

from the feature points detected and matched in multiple

views.

Tomasi-Kanade factorization algorithm Second, it has been known that the second step of

the enforcement of metric constraints is capital, but

hard to succeed in some cases. This has been primarily

Related to the tracking, given the tracked points over the

viewed as a successive step of the entire reconstruction

sequence, but one step further to reconstruct these fea-

for af ne cameras. Now from an uncalibrated projective

ture points in 3D space. By rst considering a simpli ed

viewpoint, the enforcement of the metric constraints is

camera projection model that is approximated by an or-

thogonal type projection model (x, y, z ) (u, v ) in- equivalent to the autocalibration of the uncalibrated ap-

proach. Indeed, using the orthogonality constraints orig-

stead of a more general central projection model. Tomasi

inally proposed by Tomasi and Kanade gives an alterna-

(1991) proposed the factorization algorithm for recon-

struction. Given n points tracked over m views, and stack tive way of formulating the autocalibration both for af ne

all measurements (u, v ) over all views to form a big and projective cameras, which is usually formulated in

measurement matrix W, which has a rank constraint by terms of the absolute conic borrowed from the projective

its construction W = MS. Using SVD on M results in geometry.

W = U V = (U 1/2 )( 1/2 V ) = MS. The motion M Combining Lucas-Kanade-Tomasi feature detection

and tracking with the projective factorization is an al-

and shape S are de ned up to an arbitrary af ne transfor-

mation A as W = MS = (MA 1 )(AS), so the M and S ternative of the standard uncalibrated approach which

are de facto the af ne motion and af ne shape. Next, the is often implemented (Pollefeys et al., 1998; Hartley and

Zisserman, 2000; Faugeras et al., 2001) by rst extracting

metric constraints to guarantee that M is a valid rotation

the points of interest, Harris corners, then combining with

matrix are used to nalize the Euclidean motion M and

robust statistics such as RANSAC or LMS with a projec-

Euclidean shape S. Each of these two steps contains fun-

tive structure (Zhang et al., 1995), and nally optimizing

damental concepts that are related to the development of

the structure by a bundle-adjustment-like optimization

the uncalibrated approach.

(Mohr et al., 1995, 1992; Triggs et al., 2000) in projective

First, the importance of the pre-metric structure of the

shape, the S resulted from the SVD, though not eluci- space.

Image-Based Modeling by Joint Segmentation 137

Figure 1. One overview on the right of the reconstructed quasi-dense points for the entire scene from 25 images shown on the left.

Quasi-dense approach required. The main contribution of this paper is to intro-

duce a joint segmentation framework, for both 3D points

This sparse structure from motion approach usually re- and 2D pixels, and look for robust and ef cient solutions

quires a dense frame rate and leads to a too sparse set of to it in Section 3. Figure 2 shows one of the original 25

points to be suf cient for object modeling and depiction. images captured by a handheld camera for the example

This insuf ciency motivated the development the quasi- scenario, a rendered image of the nal modeling result

dense approach started since 1998 in Lhuillier (1998) and based on the desired 3D segmentation and 2D segmenta-

matured into the work in Lhuillier and Quan (2005). It tion results by a semi-automatic approach developed in

is robust and handles more distant views. More impor- Quan et al. (2006) using the joint segmentation method

tantly, it produces a set of semi-dense 3D points that was presented in this paper. The segmentation and modeling

impossible with the previous methods. The quasi-dense of such complex objects are almost impossible without

approach will be brie y revisited in Section 2. One exam- the joint segmentation.

ple is shown in Fig. 1. The quasi-dense approach can be

viewed as an extension to the discussed two key subjects,

2. Quasi-Dense Approach Revisited

feature point selection, re-sampled quasi-dense point vs.

interest point, and reconstruction, hierarchical division

The quasi-dense approach developed in Lhuillier (1998);

combined with a bundle-like method vs. batch solution

Lhuillier and Quan (2002, 2005) overcomes the sparse-

combined with a factorization.

ness of feature points and results in a more ef cient

and robust algorithm when combined with a bundle

Joint segmentation approach adjustment like algorithms both at projective and Eu-

clidean stages. The purpose is to view the quasi-dense

The increased density of the reconstructed 3D points from approach as an extension of the discussed two key sub-

multiple views, paves the way for the three-dimensional jects: feature point, quasi-dense point vs. interest point;

modeling of the objects in space, in addition to the re- and reconstruction, hierarchical division combined with a

covery of the camera positions. But the 3D points, even bundle-like method vs. batch solution with a factorization

semi-dense, are unstructured in space, therefore are not scheme.

yet suf cient for creating a geometric model of the un-

derlying objects. It is necessary to group the points and

pixels that belong to the same object into the same cluster 2.1. Quasi-Dense Point

of points and pixels. Obviously, the concept of object is

Initialization by sparse points of interest. We start by

subjective: for the example scenario shown in Fig. 2, the

whole plant might be considered as an object for some ap- detecting the points of interest in each image (Zhang et al.,

plications, whereas each individual leaf should be consid- 1995), then compute the Zero-Mean Normalized Cross-

ered as an independent object for some others, depending Correlation (ZNCC) to match points of interest in two

on the application and the realism details of the modeling images. First, we do the correlation from the rst image

Quan et al.

138

Figure 2. A proper segmentation is fundamental to 3D modeling. (a) One of the 25 original images captured by a handheld camera. (b) A rendered

image at a similar viewpoint of the reconstructed 3D model based on the segmentation results in (c) and (d). (c) A desired segmentation of 3D data

points from the reconstructed quasi-dense points, and (d) a desired segmentation of an input image.

to the second, and then do the same from the second to the First, the key is the de nition of the potential match

candidates N (x, x ) satisfying the disparity gradient

rst to retain a one-to-one consistent matching of points

of interest between two images. Most of the standard (Lhuillier and Quan, 2002; Pollard et al., 1985) as

approaches will continue with this set of feature points by

introducing global geometry constraint. The quasi-dense

N (a, a ) = {(b, b ), b N (a), b N (a ),

approach will not, it is merely initialized at this stage.

(b b) (a a)

Contact this candidate