Object Model

Location:

Princeton, NJ

Posted:

December 10, 2012

Contact this candidate

Resume:

Tracking the Small Object through Clutter with Adaptive Particle Filter@

Yu Huang, Joan Llach

Thomson Corporate Research, Princeton, New Jersey, USA

**.*******@*****.***, ****.*****@*******.***

was proposed in [14], where adaptive state transition utilizing

Abstract

the new observation data on the affine flow constraints is

realized and the diversity of particles is adapted based on

Cluttered background and occlusion cause large

motion estimation errors.

ambiguity in the tracking of video objects. When the object is

However, how to estimate the particle weight from the

small (like a soccer ball in broadcast game video signals), the

current observation data to differentiate the object from the

ambiguity gets even more severe. In this paper, we propose an

cluttered background is more critical. Histogram and contour

adaptive particle filter with effective proposal distribution to

have been proved to be robust features in object tracking

handle these situations. In the proposed tracking approach,

through clutter [4, 7, 11, 12], but they may not characterize

motion estimation is embedded into the state transition to

well the appearance of small objects. The intensity-based

tackle abrupt motion changes and generate good proposal

appearance model is used in [14] and a mixture model is

distributions. We also propose a mixture model to account for

employed to handle its variation. Nonetheless, cluttered

multiple hypotheses in the template correlation surface when

background is not coped with explicitly in [14].

estimating the appearance likelihood. In addition, motion

Likewise, in [1] motion estimation is embedded into the

continuity and trajectory smoothness are combined with

state transition, too. Its likelihood function accounts for

template correlation in the observation likelihood to further

uncertainty in template matching based on correlation surface

filter out visual distracters. As an example of small object

(with a fixed size) [9]. Despite that, no motion or trajectory

tracking, promising results of the ball tracking (as small as 30

information is involved into the likelihood calculation even

pixels) in soccer game videos are presented to illustrate that

though motion continuity and trajectory smoothness are

the proposed scheme handles the cluttered background and

helpful to filter out the visual distracters in complicated

occlusion effectively.

cluttered background. Moreover, this approach doesn t take

into account multiple candidates [11] in the correlation surface.

1. Introduction In comparison, in [10] not only patch correlation

(normalized cross correlation), shape and color information,

Visual tracking is a crucial element of many computer vision but also motion measurements are introduced into the

systems. A lot of applications such as visual surveillance, smart likelihood function. Even so, no template is adapted in this

rooms, video compression and vision-based interfaces often method. In contrast, "na ve update" [8] was performed. For

require a visual tracker to be robust in complex environments this reason, it cannot handle severe occlusion or "drifting"

and efficient in computation [1, 4-7, 10, 12, 14]. artifacts explicitly.

Recently, particle filters (PF) have gained more attention in

visual tracking [1, 4, 7, 10, 12, 14]. The efficiency and 1.1 Ball tracking in soccer game video

accuracy of the particle filter depends on two key factors: how In this paper, our experiments focus on the soccer ball in

a swarm of particles are generated by a proposal distribution game videos, an instance for small object tracking. Actually

and how these particles are weighted to approximate the real soccer video analysis is receiving increasing attention from

posterior state distribution. The pioneering work of researchers [2, 3, 13] as a convergence of computer vision and

Condensation [7] uses the state transition prior as the proposal multimedia technologies, mainly motivated by applications

distribution. This type of particle filter is prone to be distracted such as event analysis, automatic indexing and object-based

by background clutter because the state transition does not encoding. In particular, significant research work aims at

take into account the most recent observation. obtaining the ball s position since it plays an important role in

As an alternative, the unscented particle filter was used in detecting key events and improving object-based compression

[12] to generate importance densities. However, this approach performance.

needs to convert likelihood evaluations into state space However, detecting and tracking the ball in sports video

measurements. Furthermore, it is still likely to fail in the from broadcast signals is a really challenging problem. The ball

presence of abrupt motion changes. An adaptive particle filter may look very similar in appearance to other regions of the

Yu Huang is now working at Futurewei Technologies Inc., NJ, USA

image; for example, portions of the players' jerseys could Let us denote the estimated motion for the object as Vt .

trigger false alarms. It is also frequently merged with field Accordingly, the dynamic model in (1) can be reformulated as

lines or occluded by the players and, additionally, the ball

X t +1 = X t + Vt + t, (3)

moves fast most of the time and is quite small (less than 30

with t denoting the state prediction error.

pixels in size) when the camera is capturing a wide view of the

playfield.

2.1.2. Adaptive Process Noise Variance

To vary the diversity of particles, t [ min, max ] is

1.2 Overview of our approach

In this paper, we propose an adaptive particle filter with proportional to the motion estimation error, i.e. a defined

effective proposal distribution to deal with severe cluttered

residual measure uI x + vI y + I t, where I x, I y, I t are partial

background in small object tracking. Our proposal is inspired

by [14] with combination of work in both [1] and [10]. To derivatives of the intensity function I with respect to x, y and t.

start with, motion estimation is embedded into state transition If motion estimation fails (by thresholding the average of

to tackle abrupt motion changes and generate a good proposal absolute difference), motion is set zero like a "random walk"

and t is set as the maximum value.

distribution. Then, we propose a mixture model to account for

multiple hypotheses in the template correlation surface for

estimating the appearance likelihood. Different from [1] and 2.2. Observation Model

[10], we utilize the explicit motion measures through both the The observation model measures the weights of particles

dynamic model and the likelihood function. based on a predefined likelihood function. Here it is defined as

The paper is organized as follows. Section 2 discusses the

P ( Z t X t ) = P( Z tint X t ) P( Z tmot X t ) Ot 1 P( Z ttrj X t )1 Ot 1,(4)

proposed tracking algorithm, an adaptive particle filter. In

where Z t = {Z tint, Z tmot, Z ttrj } and the intensity measurement

Section 3, results using broadcast soccer videos are shown.

Finally, conclusions are drawn in Section 4. Z tint is assumed to be independent from either the motion

measurement Z tmot o r the trajectory measurement Z ttrj, Ot = 0

2. The Proposed Tracking Approach

if the object is occluded, and 1 o therwise. When the object is

visible trajectory constraints are not enforced, which avoids

Particle filter is a state space method for implementing a

violating the temporal Markov chain assumption; on the other

recursive Bayesian filter by Monte Carlo simulations. The key

hand, when the object is occluded or motion estimation fails,

idea is to approximate the posterior probability distribution by

the trajectory smoothness takes the place of motion continuity

a weighted particle set. Each particle represents one

in the observation likelihood. Details of each likelihood

hypothetical state of the object, with a corresponding discrete

component are given below.

sampling probability (weight). The mean state of an object is

The intensity measurement is computed with the similarity

estimated at each time step by weighted average of all the

between the target model (template) and the candidate particle.

particles. Usually resampling is used to alleviate particles'

A simple metric is the sum-of-squared-differences (SSD) for

degeneracy.

each particle as

The efficiency and accuracy of a particle filter for tracking

(5)

Z t = arg min [T I ( + X t )]2

relies on the definition of a good proposal distribution and an

X t Neib

effective observation model for particle weights. Below we

where W is the object window, Neib is a small neighborhood

will give details on these issues in our proposed algorithm.

We define the object s state vector as X = ( x, y), where (x, around X t, T is the object template and I is the image in

y) is the window center of the object. The state space model the current time.

for object tracking is formulated as This metric cannot be used directly for intensity likelihood

X t +1 = f ( X t, t ), (1) in the case of highly cluttered background. Instead, the

correlation surface [9] can better measure the uncertainty and

Zt = g( X t, t ), (2)

generate a reasonable estimate.

where X t represents the object state vector, Z t is the The SSD-based correlation surface for each particle in its

support area Neib is defined as

observation vector, f and g are the dynamic model and the

r ( X ) = [T I ( + X )] 2, X Neib . (6)

observation model, respectively, and t and t represent the

t t t

process and observation noise, respectively.

Compared with the fixed size of the correlation surface in [1,

9], the surface size in our proposal varies (from 3x3 to 11x11

2.1. Dynamic Model

pixels in the examples) proportionally to the motion estimation

The dynamic model characterizes the object state change

error as well given in formula (3), similar to the process noise

between frames. Similar to [1, 10, 14], we directly obtain the

variance, which provides a flexible measure of the ambiguity in

apparent motion of the object by a hierarchical estimation

template matching.

framework [5, 6]. Since the most recent observation is used

Inspired by [7, 11], we assume having detected J candidates

for state transition, a better proposal distribution is generated.

from the correlation surface inside Neib. As a result, J+1

hypothesis can be defined as:

2.1.1. Adaptive Motion Model

H 0 = {c j = C : j = 1 J }, the object. Let s denote the trajectory function in a polynomial

form

H j = {c j = T, ci = C : i = 1 J, i j}, j=1,, J, m

y = i =0 ai x i, (10)

where c j = T means the jth candidate is associated with the

where are the polynomial coefficients and m is the order of

true match, c = C otherwise. Hypothesis H 0 means that none

the polynomial function (for the examples in this paper, m=2).

of the candidates is associated with the true match.

Only past "visible" object positions are used for trajectory

The clutter is assumed to be uniformly distributed over Neib

fitting. A forgotten factor F = f t _ o is defined, where f is

and hence the true match-oriented measurement is Gaussian

the forgotten ratio, ( 0 1

the estimated trajectory to keep its motion smoothness.

where ( xt, y t ) is the particle s position change with respect

Illustrated in Fig. 1: given the two last reliable positions X j

to ( x t 1, y t 1 ), and ( x, y) is the average object speed in past

and X i at frames j and i respectively (i>j), the predicted one

history, i.e.

X cur is calculated as

t 1 t 1

x = xs 1 / k, y = y s 1 / k . (k=10)

x y X cur = X i +( X i - X j )*(cur-i)/(i-j).

s s

s=t k s=t k

~ is defined as the point on the trajectory

Hence the motion likelihood is calculated as Its projection X cur

d mot .

1 closest to X cur . Hence, we refine the object position as

P( Z tmot X t ) = exp(

(9)

)

2 mot

) X cur + X cur * f t _ o .

X cur =(1- f

t _o

(12)

This component accouts for contraints from motion continuity

If Ocur =1, we employ the template update approach [8] in a

of the object. In comparison to [10], our motion likelihood

takes into account the recent motion history of the object. conservative way to cope with the appearance variation, i.e.

The trajectory likelihood is estimated from the particle s the "drifting" artifact in tracking.

closeness to a trajectory that is obtained from past positions of

2.4. The outline of the proposed method

Fig. 2 shows the framework of our proposed approach for

small object tracking.

With the particle set {( X t( 1, t i = 1 N } at time t-

i) i

1, we proceed at time t as follows:

Prediction: If Ot 1 =1, estimate motion Vt and

prediction error t ; Otherwise Vt =0, t = max . For

i=1 N, simulate X t( i ) ~ N ( X t + Vt, t ) ;

Fig. 3: Failure of the TM tracker (frames 35, 43)

Updating: For i=1 N, t(i ) = P( Z t X t(i ) ) by (4),

consisting of the intensity, motion and trajectory

likelihood terms by (7), (9) and (11).

Resample (if necessary): with the particle weight

set { t( i ) i = 1 N }, run residual resampling (its virtue

lie in insensitivity to the particle order compared with

other techniques). Replace {( X t( i ), t( i ) ) i = 1 N } by

Fig. 4: Failure of the KLT tracker (frames 35, 36)

{( X t( i ),1 / N ) i = 1 N } .

Estimate: If Ot 1 =1, output the average of all

particles; Otherwise, select the particles (one or more)

with the maximum weight and output the average of

them. Detect occlusion for Ot setting. If Ot =1, handle

drifting by template update. Otherwise, project the

estimated position onto the trajectory by (12).

Fig. 2 Particle filter-based small object tracking

3. Experimental Results

We have implemented the proposed tracking method in

Visual C++ and it runs at about 15fps on a 3.2GHz PC

platform. The video from broadcast soccer game signals in the

Fig. 5 Tracking results (frames 34, 41, 146, 153)

following examples is 360x240, 30Hz and lasts 190 frames.

For the first case, Fig. 5 shows that our method still finds

The tracker is initialized manually. The template size for the

the ball when it reappears from occlusion by players, the most

soccer ball is 5x5. The number of particles is 200. In all the

complicated case in ambiguity. Furthermore, Fig. 6 shows the

examples in this paper, the yellow ellipse shows the ball

resampled particles (ellipses in blue) when the ball is physically

position and size (for clarity, a zoomed portion from the pink

occluded by players (ellipse in black means the final estimate

area of the frame is shown on the top left corner of each

with the lowest confidence). We can see that more diversified

image).

particles are retained to wait for the ball to reappear.

To illustrate the performance of the proposed method,

The second case is illustrated in Fig. 7, where our tracker

we've ran two classic tracking methods for comparison: one is

follows the ball when it has left the field lines for the clean

an optic flow-based tracker with template update [8] (a KLT

grass field. Since field lines look similar in color to the ball,

tracker) and the other is a traditional template matching

template matching becomes more ambiguous when the ball

tracker (the TM tracker) updated by an IIR filter (the update

approaches them. Likewise, Fig. 8 shows the resampled

ratio is 0.15). Shown in Fig. 3, the template matching (TM)

particles when the ball falls onto the field line. This case is

tracker first falls in a region on the player jersey and then drifts

regarded as virtual occlusion since its uncertainty is similar

away (frame 43). The estimated normalized cross correlation is

to a real occlusion. After resampling, particles close to the

still high (0.86) on frame 43. In Fig. 4, we see that the optical

field lines are retained which will be propagated to detect the

flow-based (KLT) tracker drifts away on frame 36 due to

ball that goes into the clean grass area again.

partial occlusion by a player jersey area.

The tracking error is given in Fig. 9 where the ground truth

Tracking results of the proposed algorithm are given in Fig.

is generated manually (when the ball is occluded, its real

5-9. In this video, occlusion of the ball by the player occurs

position has to be obtained by interpolation). Actually the

twice (frames 36-40, 147-153) and the field mark lines merge

several peaks in this error curve indicate the time periods when

with the ball twice (frames 136-140, 175-176). Our proposal

the ball is occluded by the player or merges with the field line.

handles both cases successfully.

when our approach applies for soccer ball tracking show that it

can deal with challenging situations with success, e.g. ball

merging with field lines or occlusions.

In future work, a small object detector [3] will contribute

information into the particle filter-based tracker to handle

motion blur and long-duration occlusions. Besides, other

interacting objects at the neighborhood, such as player/referee,

are needed to be tracked at the same time, expanding the

Fig. 6 Ball s occlusion by player (frame 39, 147)

system towards multiple object tracking.

5. References

[1] E. Arnaud, E. Memin, B. Cernuschi-Frias, Conditional

filters for image sequence based tracking application to point

tracking, IEEE-T-IP, 14(1):63-79, 2005.

[2] Y. Gong, L T. Sin, C. H. Chuan, H. Zhang, and M.

Sakauchi, Automatic parsing of TV soccer programs, Proc.

Multimedia Computing & Systems, pp167-174, 1995.

[3] Y. Huang, J. Llach, S. Bhagavathy, Players and Ball

Detection in Soccer Videos Based on Color Segmentation and

Shape Analysis, Int. Workshop on Multimedia Content

Analysis and Mining (MCAM'07), June, 2007.

[4] Y. Huang, J. Llach, Variable Number of Informative

Particles for Object Tracking . IEEE ICME'07, July, 2007.

[5] Y. Huang, T. S. Huang, H. Niemann, Segmentation-based

Object Tracking Using Image Warping and Kalman Filtering,

IEEE ICIP 02, Rochester city, US, Sept. 2002.

Fig. 7 Tracking results (frames 134, 141, 174, 177)

[6] Y. Huang, T. S. Huang, H. Niemann, Region-based

Method for Model-free Object Tracking, IAPR

ICPR 02, Quebec city, Canada, Aug., 2002.

[7] M. Isard and A. Blake, Condensation -Conditional density

propagation for visual tracking, IJCV, 29(1), 1998.

[8] I. Matthews, S. Baker, and T. Ishikawa, The Template

Update Problem, IEEE T-PAMI, 26(6), pp810- 815, 2004.

[9] K. Nickels, S. Hutchinson, Estimating uncertainty in SSD-

based feature tracker, Image & Vision Computing, 20(1),

pp47-58, 2002.

Fig. 8 Ball s merging with the line (frame 138, 176) [10] J. Odobez, D. Perez, S. Ba, Embedding motion in

model-based stochastic tracking, IEEE T-IP, 15(11), 2006.

[11] P. Perez, J. Vermaak, and A. Blake, Data fusion for

4. Conclusions visual tracking with particle filters, Proc. IEEE, 92(3), 2004.

[12] Y. Rui, Y. Chen, Better Proposal Distributions: Object

Tracking Using Unscented Particle Filter, IEEE CVPR 2001.

An adaptive particle filter for small object tracking which

[13] X. Yu, C. Xu, Q. Tian, and H. W. Leong, A ball

can effectively handle cluttered background and occlusion has

tracking framework for broadcast soccer video, ICME 03.

been proposed. The adaptive motion model is applied to get [14] S. Zhou, R. Chellappa, and B. Maghaddam, Appearance

better proposal distributions with varied diversity of particles. tracking using adaptive models in a particle filter, ACCV,

To further filter out visual distracters, motion continuity and 2004.

trajectory smoothness are combined with the template

correlation in the observation likelihood. Experimental results

Fig. 9 Tracking accuracy in ball localization (vertical axis corresponds to location error in pixels).

Contact this candidate