Post Job Free
Sign in

Computer Model

Location:
Dallas, TX
Posted:
November 12, 2012

Contact this candidate

Resume:

Appeared in the Proceedings of the International Workshop on Synthetic-Natural Hybrid Coding and Three

Dimensional Imaging (IWSNHC3DI'97), pp. 192-194, September 5-9, 1997, Rodos-Palace, Rhodes, Greece.

Coding of Facial Image Sequences by Model-Based Optical Flow

Malcolm Davis Mihran Tuceryan

Texas Instruments Indiana Univ. Purdue Univ. Indianapolis

8330 LBJ Freeway, MS 8374 Dept. of Computer and Information Science

Dallas, Texas 75243 Indianapolis, Indiana 46202-5132

*******.*****@**.*** ********@**.*****.***

ABSTRACT are most relevant for teleconferencing and visual

communication. Inspired by the work of Waters

A model-based method for estimating the shape 2 and Tang 3 a model of the human head and

and motion of 3D objects appearing in a video face has been developed which uses an approx-

is described. This technique is used for model- imate model of facial musculature to animate

based video coding video compression . The facial expressions. A model-based formulation

method is based on a new variant of optical ow of optical ow, derived from the work of De-

and uses 3D computer graphics to represent and Carlo and Metaxas 4, 5, provides estimates of

display an object. Though the algorithm is gen- the motion of the head model.

eral, this work concentrates on videos depicting

the human head and face because of its relevance

1.1. Model-Based Motion Estimation

to videotelephony and teleconferencing. Rigid

body motion of the head and facial expressions In 3D model-based video coding, the 3D mo-

opening the mouth are accommodated. Re- tion of the object depicted in the video must be

sults obtained from videos of a moving person determined. In a few methods for recognition,

are described. tracking, or coding of facial image sequences, op-

tical ow is mapped directly onto the parame-

ters of a 3D model in order to determine the mo-

1. INTRODUCTION

tion of the modeled object. Most methods are

speci c to a particular motion model e.g., rigid

The concept behind model-based video coding

motion and an assumed object model e.g., a

video compression is that models of 3D ob-

triangular mesh . It is possible, however, to

jects and their motion require less information

modify the optical ow implementation which

to transmit than videos of those objects. As il-

usually assumes that the motion is planar so

lustrated in Figure 1, this type of coder analyzes

that virtually any motion and object model can

the video to obtain values for the parameters of

be accommodated in a regular and automatic

these models and estimates of the 3D motion of

manner 4, 5 .

modeled objects. These parameter values and

motion estimates are transmitted and a video Parametric Representations:

display of the modeled objects and their motion First, consider a 3D object, such as a hu-

is synthesized using 3D computer graphics. man face that appears in a video. This ob-

This work concentrates on model-based cod- ject can be represented by a 3D vector function,

ing of heads and faces because such techniques

= 1 2 3 , T

s ~ ; ~s

uq s ~ ; ~s

uq s ~ ; ~s

uq

~ ~ ; ~s

suq

which associates each value of the vector with ~

u

a point on the surface of the object. The column

ENCODER DECODER

vector,, contains values that control the shape

q

~s

Input Image Output Image

of the object. The function is called a

Analysis Synthesis

Analysis Data

~ ~ ; ~s

suq

parametric representation of the object and the

elements of are the domain of this representa-

~

u

Image Source model Image Source model

tion.

As example, suppose that is a generic

~ ~ ; ~s

suq

Figure 1: A block diagram of a model-based model of an average human face instead of an

video coding system from 1 . ellipsoid . In this case, might contain param-

q

~s

eters which would 1 adapt the shape of this

generic model to the shape of a individual's face

and 2 deform the face appropriately for facial

expressions such as smiles, frowns, and raising

the eyebrows.

Coordinate Transformations: The motion of a

3D object can be represented by a coordinate

transformation which changes over time. A co-

a b c

ordinate transformation maps or transforms

each 3D coordinate location to another coordi-

Figure 2: A computer graphics representation

nate location. Examples of coordinate transfor-

of a face as a 3D triangular mesh drawn: a as

mations include rotation, translation, perspec-

a wireframe each line is an edge of a triangle ;

tive projection , and deformations such as scal-

b as solid shapes with shading added; c with

ing, bending, and twisting. A coordinate trans-

texture mapping overlaid.

formation is represented as a vector function,

= , which transforms maps the coor-

~

y ~ ~; qr

rs~

dinate location,, into a new location, . The

~

s y

~

transformation has parameters values e.g., ro-

tation angles , which comprise the elements of

27

the vector, . q

~r 1 5

2 6

3 14 7 4

0 9

Model-Based Optical Flow: The well-known

12 17 13

8 10 15 32

28

11 16

planar formulation of optical ow is based on 18 19

20

the gradient constraint equation:

22

23

29 21 25 26 31

24

r _ + =0

30

a b c

T

I ~; t

x ~

x It ~ ; t

x

where r is the gradient of with re- Figure 3: Customization of a generic face model

I ~; t

x I ~; t

x

spect to the image coordinates and = to conform to a particular individual: a a set of

~

x It ~ ; t

x

. This equation can be extended to en- facial features are detected; b the correspond-

@ I ~ ;t

x

compass arbitrary motions, with the result 4 :

@t

ing location of these features on the generic face

model; c the generic face model is warped de-

r L _ + = 0 1

T

formed to bring the two sets of features into

I ~; t

x ~; ~ ~

uqq It ~ ; t

x :

approximate alignment.

where L is the Jacobian matrix of the

~; ~

uq

coordinate transformation from object coordi-

nates to camera coordinates, including anima-

occurring in the video at time, expressed as the

tion or deformation of the object, with respect t

rate of change in the object position parameters,

to the parameters of the transformation and the

. The object e.g., a face position parameters,

. This matrix, L ,

model, = q

~

T

T T, at time can be determined by numerically

q

~r q

~s

q

~ ~; ~

uq

is used to transform partial derivatives of into q

~ t

integrating _ . Presently, this algorithm makes

q

~

partial derivatives of : _ = L _ . ~

q

use of the Euler method.

~

x ~

x ~; ~ ~

uqq

Equation 1 represents the fundamental

principle for estimating 3D motion from opti- The Face Model: The 3D computer graphics

cal ow. As in the typical application of op- representation of the face is a 3D triangular

tical ow, values for the spatial and temporal mesh with texture mapping. The model is de-

derivatives of the image, r and , picted in Figure 2. This 3D model of a typical

I ~; t

x It ~ ; t

x

are obtained using derivative lters with Gaus- human head and face is customized to t the

sian kernels . It is assumed that the Jacobian shape of the individual depicted in the video

matrix and the transformation are known in ad- as illustrated in Figure 3. The jaw can ro-

vance, e.g., the object is a face and the transfor- tate, i.e., the mouth can open and close. By a

mation is a combination of rotation, translation, method analogous to that of Waters 2, major

facial expressions, and perspective projection . muscles of the face are approximated by a set of

Only _ remains undetermined. actuators" that can contract and relax. These

q

~

Least squares is used to solve for _ from the actuators are anchored to xed locations bone

q

~

spatial and temporal derivatives of the image at at one end and, at the other end, are attached to

a set of points,, = 1 2 3 . The result- vertices of the triangular mesh skin through a

~i

x i ; ; N

ing value for _ is the 3D estimate of the motion simulated exible medium. By appropriate ac-

q

~

tivation of groups of muscles" the face can be

made to smile, frown, raise an eyebrow, and so

on.

The Initial Pose: The position pose of the face

in the rst frame of the video is needed to initial-

ize the algorithm. This inital pose is determined a

by using an optimization algorithm gradient de-

scent to minimize the mean squared error be-

tween feature locations on the actual face and

the modeled face.

Model-Based Coding: In model-based video cod-

ing, the values of the parameters de ning the b

current face shape and orientation, _ or ,

q

~ q

~

are encoded and transmitted for each frame in

the video. Other information, such as the cus-

tomization of the shape of the head model for

the individual depicted in the video, is trans-

mitted only at the beginning of communication.

c

2. RESULTS

The motion estimation algorithm described in

this paper has been applied to video sequences

depicting the head and shoulders. The motion d

of the head appearing in one video is tracked

and used to create a second video depicting the Figure 4: Model-based motion estimation of

computer-generated face model as it follows the M.T.: a frames extracted from a video of M.T.;

motions in the rst video. An example of rigid b the same ve frames from a video where the

motion estimation is illustrated in Figure 4. In computer-generated head image follows the mo-

Figure 4 a , 5 frames extracted from a 100 frame tion of M.T.'s head; c the video of b with the

video of M.T. are shown. The same frames from texture map removed; d a computer-generated

the computer-generated video are displayed in video where S.K's head follows the motion of

Figure 4 b . The fact that the video is computer M.T.'s head.

generated is more apparent in Figure 4 c , where

texture mapping has been disabled. A unique

feature of this type of video coding is the abil-

ity for the person to appear di erently at the

receiver decoder than he does at the transmit-

ter encoder . This feature is illustrated in Fig-

ure 4 d where a graphics model of S.K.'s head

moves in synchronization with the video of M.T.

a

The tracking of more complex motion is illus-

trated in Figure 5 which depicts several frames

extracted from a video of M.D. as he turns his

head and simultaneously opens his mouth.

It has been indicated that about 68 parame-

ters are needed to encode facial expressions and

b

head motion. Using this value, the estimated

transmision baud rate of the video sequences

Figure 5: Model-based motion estimation of

in Figure 4 and Figure 5 is about 6,800 bits sec

M.D.: a frames extracted from a video of

10 bits parameter 68 parameters frame 10

M.D.; b the same ve frames from a video

frames sec without any encoding of the parame-

where the computer-generated head image fol-

ter values. Applying a coding scheme, like arith-

lows the motion of M.D.'s head.

metic coding, to the parameter values would sig-

ni cantly reduce even this low rate.

Acknowledgement

The authors are grateful to Scott King for his

contributions to the development of the face

model and several handy software tools. Doug

DeCarlo's frank discussions of his research are

appreciated. Bruce Flinchbaugh proofread the

manuscript.

3. REFERENCES

1 K. Aizawa and T. S. Huang, Model-based

image coding: Advanced video coding tech-

niques for very low bit-rate applications,"

Proceedings of the IEEE, vol. 83, pp. 259

271, Feb. 1995.

2 K. Waters, A muscle model for animating

three-dimensional facial expression," Com-

puter Graphics, vol. 21, pp. 17 24, July 1987.

3 L.-A. Tang, Human Face Modeling, Analy-

sis, and Synthesis. PhD thesis, Electrical

Engineering Department, University of Illi-

nois at Urbana-Champaign, Urbana, Illinois,

1996.

4 D. DeCarlo and D. Metaxas, The integra-

tion of optical ow and deformable mod-

els with appliations to human face shape

and motion estimation," in Proceedings of

the IEEE Computer Society Conference on

Computer Vision and Pattern Recognition,

San Francisco, CA , pp. 231 237, IEEE

Computer Society Press, June 18 20, 1996.

5 D. Metaxas and D. DeCarlo, Deformable

model-based face shape and motion esti-

mation," in Proceedings of the Second In-

ternational Conference on Automatic Face

and Gesture Recognition, Killington, VT ,

pp. 146 150, IEEE Computer Society Press,

Oct. 14 16, 1996.

hc97.dvi



Contact this candidate