Tracking the Small Object through Clutter with Adaptive Particle Filter@
Yu Huang, Joan Llach
Thomson Corporate Research, Princeton, New Jersey, USA
**.*******@*****.***, ****.*****@*******.***
was proposed in [14], where adaptive state transition utilizing
Abstract
the new observation data on the affine flow constraints is
realized and the diversity of particles is adapted based on
Cluttered background and occlusion cause large
motion estimation errors.
ambiguity in the tracking of video objects. When the object is
However, how to estimate the particle weight from the
small (like a soccer ball in broadcast game video signals), the
current observation data to differentiate the object from the
ambiguity gets even more severe. In this paper, we propose an
cluttered background is more critical. Histogram and contour
adaptive particle filter with effective proposal distribution to
have been proved to be robust features in object tracking
handle these situations. In the proposed tracking approach,
through clutter [4, 7, 11, 12], but they may not characterize
motion estimation is embedded into the state transition to
well the appearance of small objects. The intensity-based
tackle abrupt motion changes and generate good proposal
appearance model is used in [14] and a mixture model is
distributions. We also propose a mixture model to account for
employed to handle its variation. Nonetheless, cluttered
multiple hypotheses in the template correlation surface when
background is not coped with explicitly in [14].
estimating the appearance likelihood. In addition, motion
Likewise, in [1] motion estimation is embedded into the
continuity and trajectory smoothness are combined with
state transition, too. Its likelihood function accounts for
template correlation in the observation likelihood to further
uncertainty in template matching based on correlation surface
filter out visual distracters. As an example of small object
(with a fixed size) [9]. Despite that, no motion or trajectory
tracking, promising results of the ball tracking (as small as 30
information is involved into the likelihood calculation even
pixels) in soccer game videos are presented to illustrate that
though motion continuity and trajectory smoothness are
the proposed scheme handles the cluttered background and
helpful to filter out the visual distracters in complicated
occlusion effectively.
cluttered background. Moreover, this approach doesn t take
into account multiple candidates [11] in the correlation surface.
1. Introduction In comparison, in [10] not only patch correlation
(normalized cross correlation), shape and color information,
Visual tracking is a crucial element of many computer vision but also motion measurements are introduced into the
systems. A lot of applications such as visual surveillance, smart likelihood function. Even so, no template is adapted in this
rooms, video compression and vision-based interfaces often method. In contrast, "na ve update" [8] was performed. For
require a visual tracker to be robust in complex environments this reason, it cannot handle severe occlusion or "drifting"
and efficient in computation [1, 4-7, 10, 12, 14]. artifacts explicitly.
Recently, particle filters (PF) have gained more attention in
visual tracking [1, 4, 7, 10, 12, 14]. The efficiency and 1.1 Ball tracking in soccer game video
accuracy of the particle filter depends on two key factors: how In this paper, our experiments focus on the soccer ball in
a swarm of particles are generated by a proposal distribution game videos, an instance for small object tracking. Actually
and how these particles are weighted to approximate the real soccer video analysis is receiving increasing attention from
posterior state distribution. The pioneering work of researchers [2, 3, 13] as a convergence of computer vision and
Condensation [7] uses the state transition prior as the proposal multimedia technologies, mainly motivated by applications
distribution. This type of particle filter is prone to be distracted such as event analysis, automatic indexing and object-based
by background clutter because the state transition does not encoding. In particular, significant research work aims at
take into account the most recent observation. obtaining the ball s position since it plays an important role in
As an alternative, the unscented particle filter was used in detecting key events and improving object-based compression
[12] to generate importance densities. However, this approach performance.
needs to convert likelihood evaluations into state space However, detecting and tracking the ball in sports video
measurements. Furthermore, it is still likely to fail in the from broadcast signals is a really challenging problem. The ball
presence of abrupt motion changes. An adaptive particle filter may look very similar in appearance to other regions of the
@
Yu Huang is now working at Futurewei Technologies Inc., NJ, USA
image; for example, portions of the players' jerseys could Let us denote the estimated motion for the object as Vt .
trigger false alarms. It is also frequently merged with field Accordingly, the dynamic model in (1) can be reformulated as
lines or occluded by the players and, additionally, the ball
X t +1 = X t + Vt + t, (3)
moves fast most of the time and is quite small (less than 30
with t denoting the state prediction error.
pixels in size) when the camera is capturing a wide view of the
playfield.
2.1.2. Adaptive Process Noise Variance
To vary the diversity of particles, t [ min, max ] is
1.2 Overview of our approach
In this paper, we propose an adaptive particle filter with proportional to the motion estimation error, i.e. a defined
effective proposal distribution to deal with severe cluttered
residual measure uI x + vI y + I t, where I x, I y, I t are partial
background in small object tracking. Our proposal is inspired
by [14] with combination of work in both [1] and [10]. To derivatives of the intensity function I with respect to x, y and t.
start with, motion estimation is embedded into state transition If motion estimation fails (by thresholding the average of
to tackle abrupt motion changes and generate a good proposal absolute difference), motion is set zero like a "random walk"
and t is set as the maximum value.
distribution. Then, we propose a mixture model to account for
multiple hypotheses in the template correlation surface for
estimating the appearance likelihood. Different from [1] and 2.2. Observation Model
[10], we utilize the explicit motion measures through both the The observation model measures the weights of particles
dynamic model and the likelihood function. based on a predefined likelihood function. Here it is defined as
The paper is organized as follows. Section 2 discusses the
P ( Z t X t ) = P( Z tint X t ) P( Z tmot X t ) Ot 1 P( Z ttrj X t )1 Ot 1,(4)
proposed tracking algorithm, an adaptive particle filter. In
where Z t = {Z tint, Z tmot, Z ttrj } and the intensity measurement
Section 3, results using broadcast soccer videos are shown.
Finally, conclusions are drawn in Section 4. Z tint is assumed to be independent from either the motion
measurement Z tmot o r the trajectory measurement Z ttrj, Ot = 0
2. The Proposed Tracking Approach
if the object is occluded, and 1 o therwise. When the object is
visible trajectory constraints are not enforced, which avoids
Particle filter is a state space method for implementing a
violating the temporal Markov chain assumption; on the other
recursive Bayesian filter by Monte Carlo simulations. The key
hand, when the object is occluded or motion estimation fails,
idea is to approximate the posterior probability distribution by
the trajectory smoothness takes the place of motion continuity
a weighted particle set. Each particle represents one
in the observation likelihood. Details of each likelihood
hypothetical state of the object, with a corresponding discrete
component are given below.
sampling probability (weight). The mean state of an object is
The intensity measurement is computed with the similarity
estimated at each time step by weighted average of all the
between the target model (template) and the candidate particle.
particles. Usually resampling is used to alleviate particles'
A simple metric is the sum-of-squared-differences (SSD) for
degeneracy.
each particle as
The efficiency and accuracy of a particle filter for tracking
(5)
Z t = arg min [T I ( + X t )]2
relies on the definition of a good proposal distribution and an
W
X t Neib
effective observation model for particle weights. Below we
where W is the object window, Neib is a small neighborhood
will give details on these issues in our proposed algorithm.
We define the object s state vector as X = ( x, y), where (x, around X t, T is the object template and I is the image in
y) is the window center of the object. The state space model the current time.
for object tracking is formulated as This metric cannot be used directly for intensity likelihood
X t +1 = f ( X t, t ), (1) in the case of highly cluttered background. Instead, the
correlation surface [9] can better measure the uncertainty and
Zt = g( X t, t ), (2)
generate a reasonable estimate.
where X t represents the object state vector, Z t is the The SSD-based correlation surface for each particle in its
support area Neib is defined as
observation vector, f and g are the dynamic model and the
r ( X ) = [T I ( + X )] 2, X Neib . (6)
observation model, respectively, and t and t represent the
t t t
W
process and observation noise, respectively.
Compared with the fixed size of the correlation surface in [1,
9], the surface size in our proposal varies (from 3x3 to 11x11
2.1. Dynamic Model
pixels in the examples) proportionally to the motion estimation
The dynamic model characterizes the object state change
error as well given in formula (3), similar to the process noise
between frames. Similar to [1, 10, 14], we directly obtain the
variance, which provides a flexible measure of the ambiguity in
apparent motion of the object by a hierarchical estimation
template matching.
framework [5, 6]. Since the most recent observation is used
Inspired by [7, 11], we assume having detected J candidates
for state transition, a better proposal distribution is generated.
from the correlation surface inside Neib. As a result, J+1
hypothesis can be defined as:
2.1.1. Adaptive Motion Model
H 0 = {c j = C : j = 1 J }, the object. Let s denote the trajectory function in a polynomial
form
H j = {c j = T, ci = C : i = 1 J, i j}, j=1,, J, m
y = i =0 ai x i, (10)
where c j = T means the jth candidate is associated with the
ai
where are the polynomial coefficients and m is the order of
true match, c = C otherwise. Hypothesis H 0 means that none
j
the polynomial function (for the examples in this paper, m=2).
of the candidates is associated with the true match.
Only past "visible" object positions are used for trajectory
The clutter is assumed to be uniformly distributed over Neib
fitting. A forgotten factor F = f t _ o is defined, where f is
and hence the true match-oriented measurement is Gaussian
the forgotten ratio, ( 0 1
2
the estimated trajectory to keep its motion smoothness.
where ( xt, y t ) is the particle s position change with respect
Illustrated in Fig. 1: given the two last reliable positions X j
to ( x t 1, y t 1 ), and ( x, y) is the average object speed in past
and X i at frames j and i respectively (i>j), the predicted one
history, i.e.
X cur is calculated as
t 1 t 1
x = xs 1 / k, y = y s 1 / k . (k=10)
x y X cur = X i +( X i - X j )*(cur-i)/(i-j).
s s
s=t k s=t k
~ is defined as the point on the trajectory
Hence the motion likelihood is calculated as Its projection X cur
2
d mot .
1 closest to X cur . Hence, we refine the object position as
P( Z tmot X t ) = exp(
(9)
)
2 mot
2 mot
2
) X cur + X cur * f t _ o .
X cur =(1- f
~
t _o
(12)
This component accouts for contraints from motion continuity
If Ocur =1, we employ the template update approach [8] in a
of the object. In comparison to [10], our motion likelihood
takes into account the recent motion history of the object. conservative way to cope with the appearance variation, i.e.
The trajectory likelihood is estimated from the particle s the "drifting" artifact in tracking.
closeness to a trajectory that is obtained from past positions of
2.4. The outline of the proposed method
Fig. 2 shows the framework of our proposed approach for
small object tracking.
With the particle set {( X t( 1, t i = 1 N } at time t-
i) i
1
1, we proceed at time t as follows:
Prediction: If Ot 1 =1, estimate motion Vt and
prediction error t ; Otherwise Vt =0, t = max . For
i=1 N, simulate X t( i ) ~ N ( X t + Vt, t ) ;
i
Fig. 3: Failure of the TM tracker (frames 35, 43)
1
Updating: For i=1 N, t(i ) = P( Z t X t(i ) ) by (4),
consisting of the intensity, motion and trajectory
likelihood terms by (7), (9) and (11).
Resample (if necessary): with the particle weight
set { t( i ) i = 1 N }, run residual resampling (its virtue
lie in insensitivity to the particle order compared with
other techniques). Replace {( X t( i ), t( i ) ) i = 1 N } by
Fig. 4: Failure of the KLT tracker (frames 35, 36)
~
{( X t( i ),1 / N ) i = 1 N } .
Estimate: If Ot 1 =1, output the average of all
particles; Otherwise, select the particles (one or more)
with the maximum weight and output the average of
them. Detect occlusion for Ot setting. If Ot =1, handle
drifting by template update. Otherwise, project the
estimated position onto the trajectory by (12).
Fig. 2 Particle filter-based small object tracking
3. Experimental Results
We have implemented the proposed tracking method in
Visual C++ and it runs at about 15fps on a 3.2GHz PC
platform. The video from broadcast soccer game signals in the
Fig. 5 Tracking results (frames 34, 41, 146, 153)
following examples is 360x240, 30Hz and lasts 190 frames.
For the first case, Fig. 5 shows that our method still finds
The tracker is initialized manually. The template size for the
the ball when it reappears from occlusion by players, the most
soccer ball is 5x5. The number of particles is 200. In all the
complicated case in ambiguity. Furthermore, Fig. 6 shows the
examples in this paper, the yellow ellipse shows the ball
resampled particles (ellipses in blue) when the ball is physically
position and size (for clarity, a zoomed portion from the pink
occluded by players (ellipse in black means the final estimate
area of the frame is shown on the top left corner of each
with the lowest confidence). We can see that more diversified
image).
particles are retained to wait for the ball to reappear.
To illustrate the performance of the proposed method,
The second case is illustrated in Fig. 7, where our tracker
we've ran two classic tracking methods for comparison: one is
follows the ball when it has left the field lines for the clean
an optic flow-based tracker with template update [8] (a KLT
grass field. Since field lines look similar in color to the ball,
tracker) and the other is a traditional template matching
template matching becomes more ambiguous when the ball
tracker (the TM tracker) updated by an IIR filter (the update
approaches them. Likewise, Fig. 8 shows the resampled
ratio is 0.15). Shown in Fig. 3, the template matching (TM)
particles when the ball falls onto the field line. This case is
tracker first falls in a region on the player jersey and then drifts
regarded as virtual occlusion since its uncertainty is similar
away (frame 43). The estimated normalized cross correlation is
to a real occlusion. After resampling, particles close to the
still high (0.86) on frame 43. In Fig. 4, we see that the optical
field lines are retained which will be propagated to detect the
flow-based (KLT) tracker drifts away on frame 36 due to
ball that goes into the clean grass area again.
partial occlusion by a player jersey area.
The tracking error is given in Fig. 9 where the ground truth
Tracking results of the proposed algorithm are given in Fig.
is generated manually (when the ball is occluded, its real
5-9. In this video, occlusion of the ball by the player occurs
position has to be obtained by interpolation). Actually the
twice (frames 36-40, 147-153) and the field mark lines merge
several peaks in this error curve indicate the time periods when
with the ball twice (frames 136-140, 175-176). Our proposal
the ball is occluded by the player or merges with the field line.
handles both cases successfully.
when our approach applies for soccer ball tracking show that it
can deal with challenging situations with success, e.g. ball
merging with field lines or occlusions.
In future work, a small object detector [3] will contribute
information into the particle filter-based tracker to handle
motion blur and long-duration occlusions. Besides, other
interacting objects at the neighborhood, such as player/referee,
are needed to be tracked at the same time, expanding the
Fig. 6 Ball s occlusion by player (frame 39, 147)
system towards multiple object tracking.
5. References
[1] E. Arnaud, E. Memin, B. Cernuschi-Frias, Conditional
filters for image sequence based tracking application to point
tracking, IEEE-T-IP, 14(1):63-79, 2005.
[2] Y. Gong, L T. Sin, C. H. Chuan, H. Zhang, and M.
Sakauchi, Automatic parsing of TV soccer programs, Proc.
Multimedia Computing & Systems, pp167-174, 1995.
[3] Y. Huang, J. Llach, S. Bhagavathy, Players and Ball
Detection in Soccer Videos Based on Color Segmentation and
Shape Analysis, Int. Workshop on Multimedia Content
Analysis and Mining (MCAM'07), June, 2007.
[4] Y. Huang, J. Llach, Variable Number of Informative
Particles for Object Tracking . IEEE ICME'07, July, 2007.
[5] Y. Huang, T. S. Huang, H. Niemann, Segmentation-based
Object Tracking Using Image Warping and Kalman Filtering,
IEEE ICIP 02, Rochester city, US, Sept. 2002.
Fig. 7 Tracking results (frames 134, 141, 174, 177)
[6] Y. Huang, T. S. Huang, H. Niemann, Region-based
Method for Model-free Object Tracking, IAPR
ICPR 02, Quebec city, Canada, Aug., 2002.
[7] M. Isard and A. Blake, Condensation -Conditional density
propagation for visual tracking, IJCV, 29(1), 1998.
[8] I. Matthews, S. Baker, and T. Ishikawa, The Template
Update Problem, IEEE T-PAMI, 26(6), pp810- 815, 2004.
[9] K. Nickels, S. Hutchinson, Estimating uncertainty in SSD-
based feature tracker, Image & Vision Computing, 20(1),
pp47-58, 2002.
Fig. 8 Ball s merging with the line (frame 138, 176) [10] J. Odobez, D. Perez, S. Ba, Embedding motion in
model-based stochastic tracking, IEEE T-IP, 15(11), 2006.
[11] P. Perez, J. Vermaak, and A. Blake, Data fusion for
4. Conclusions visual tracking with particle filters, Proc. IEEE, 92(3), 2004.
[12] Y. Rui, Y. Chen, Better Proposal Distributions: Object
Tracking Using Unscented Particle Filter, IEEE CVPR 2001.
An adaptive particle filter for small object tracking which
[13] X. Yu, C. Xu, Q. Tian, and H. W. Leong, A ball
can effectively handle cluttered background and occlusion has
tracking framework for broadcast soccer video, ICME 03.
been proposed. The adaptive motion model is applied to get [14] S. Zhou, R. Chellappa, and B. Maghaddam, Appearance
better proposal distributions with varied diversity of particles. tracking using adaptive models in a particle filter, ACCV,
To further filter out visual distracters, motion continuity and 2004.
trajectory smoothness are combined with the template
correlation in the observation likelihood. Experimental results
Fig. 9 Tracking accuracy in ball localization (vertical axis corresponds to location error in pixels).