Process It

Location:

India

Posted:

November 21, 2012

Contact this candidate

Resume:

Motion Estimation Algorithm in Video Coding

Vibha Bafna1 and M.M. Mushrif2

ECE Dept, G.H.Raisoni College of Engineering, Nagpur, India

*********@******.***

ECE Dept, Yaswantrao College of Engineering, Nagpur, India

*************@*****.***

Abstract. This paper is a review of the block matching algorithms used for the

motion estimation in video compression to remove the temporal redundancy

(i.e. interprediction). It implements and compares three different types of block

matching algorithms that range from the very basic Exhaustive Search to the

fast adaptive algorithms like Adaptive Rood Pattern Search. It can be used with

common video coding standards such as H.263 and H.264.

1 Introduction

DATA compression is the process of reducing the redundancy in data representation

in order to achieve savings in storage and communication costs. Due to limited

channel bandwidth and stringent requirements of real-time video playback, video

coding is an indispensable process, for many visual communication applications and

always requires a very high compression ratio. The large amount of temporal

correlation called temporal redundancy from the compression viewpoint, between

adjacent frames in a video sequence, requires to be properly identified and eliminated

to achieve this objective. An effective and popular method to reduce the temporal

redundancy, called block-matching motion estimation (BMME), has been widely

adopted in various video coding standards, such as H.261, H.263, H.264, and in any

motion-compensated video coding technique. Therefore, fast and accurate block-

based search technique is highly desirable to assure much reduced processing delay

while maintaining good reconstructed image quality. In either standard the basic flow

of the compression decompression process is largely the same and is shown in Fig. 1.

In Fig.1 the encoding side estimates the motion in the current frame with respect to

a previous frame. A motion compensated image for the current frame is then created

that is built of blocks of image from the previous frame. The motion vectors for

blocks used for motion estimation are transmitted, as well as the difference of the

compensated image with the current frame (residue) is also JPEG encoded and sent.

The encoded image that is sent is then decoded at the encoder and used as a reference

frame for the subsequent frames.

The decoder reverses the process and creates a full frame. The whole idea behind

motion estimation based video compression is to save on bits by sending JPEG

encoded difference images which have less energy and can be highly compressed as

compared to sending a full frame that is JPEG encoded. It should be noted that the

B. Apolloni et al. (Eds.): KES 2007/WIRN 2007, Part I, LNAI 4692, pp. 485 492, 2007.

Springer-Verlag Berlin Heidelberg 2007

486 V. Bafna and M.M. Mushrif

first frame is always sent full, and so are some other frames that might occur at some

regular interval (like every 7th frame). The standards do not specify this and this

might change with every video being sent based on the dynamics of the video.

Current Image Image Decoded

frame encoder Decoder

Fig. 1. Video compression process flow in H.26x

This paper implements and evaluates the fundamental block matching algorithms.

The algorithms that have been implemented are Exhaustive Search (ES), Three Step

Search (TSS), and Adaptive Rood Pattern Search (ARPS).

2 Methodology

Motion estimation[1](ME) is the most time-consuming process. The supposition

behind motion estimation is that the patterns corresponding to objects and background

in a frame of video sequence move within the frame to form corresponding objects on

the subsequent frame. For block matching the current frame is divided into a macro

blocks that are then compared with corresponding block and its adjacent neighbors in

the previous frame to create a vector that stipulates the movement of a macro block

from one location to another in the previous frame. The search area for a good macro

block match is p pixels on all fours sides of the corresponding macro block in

previous frame. Larger motions require a larger p, and the larger the p (search

parameter) the more computationally expensive the process of motion estimation

becomes. The idea is represented in Fig 2.

The matching of one macro block with another is based on the output of a cost

function. The macro block that results in the least cost is the one that matches the

closest to current block. There are various cost functions, of which the most popular

and less computationally expensive is Mean Squared Error (MSE) given by equation

(1). Another cost function is Mean Absolute Difference (MAD) given by equation

(2).Sum of absolute error is given by equation (3).

Motion Estimation Algorithm in Video Coding 487

N-1 N-1

MSE = 1 (Cij-Rij) 2

(1)

N2 i=0 j=0

N-1 N-1

MAD = 1 (Cij-Rij) (2)

N2 i=0 j=0

N-1 N-1

(Cij-Rij)

SAE = (3)

i=0 j=0

where N is the side of the macro bock, Cij and Rij are the pixels being compared in

current macro block and reference. Peak-Signal-to-Noise-Ratio (PSNR) given by

equation (4) characterizes the motion compensated image that is created by using

motion vectors and macro clocks from the reference frame.

PSNR = 10 Log10 [(Peak to peak value of original data)2]

(4)

MSE

3 Design Approach: Algorithms

3.1 Exhaustive Search (ES)

ES[1&2] algorithm, also known as Full Search (raster scan), is the most

computationally expensive block matching algorithm of all. This algorithm calculates

the cost function at each possible location in the search window. As a result of which

it finds the best possible match and gives the highest PSNR amongst any block

matching algorithm. The obvious disadvantage to ES is that the larger the search

window gets the more computations it requires.

Search Block

16 p=7

Current

16 Macro

Block

p=7

Fig. 2. Block Matching a macro block of side 16 pixels and a search parameter p of size 7

pixels

488 V. Bafna and M.M. Mushrif

3.2 Three Step Search (TSS)

TSS[2&5] is one of the earliest fast block matching algorithms. The general idea is

represented in Figure 3. It starts with the search location at the center and sets the

step size S = 4, for a usual search parameter value of 7. It then searches at eight

locations +/- S pixels around location (0, 0). From these nine locations searched so far

it picks the one giving least cost and makes it the new search origin. It then sets the

new step size S = S/2, and repeats similar search for two more iterations until S = 1.

At that point it finds the location with the least cost function and the macro block at

that location is the best match. The calculated motion vector is then saved for

transmission. It gives a flat reduction in computation by a factor of 9. So that for

p = 7, ES will compute cost for 225 macro blocks whereas TSS computes cost for 25

macro blocks.

Legend First step Second step Third Step

Fig. 3. Three Step Search procedure. The motion vector is (5, -3).

3.3 Diamond Search (DS)

The Diamond Search[4] algorithm employs two search patterns as shown in Fig. 4.

The first pattern, called large diamond search pattern (LDSP), comprises nine

checking points from which eight points surround the center one to compose a

diamond shape (_). The second pattern consisting of five checking points forms a

smaller diamond shape,called small diamond search pattern (SDSP).In the searching

procedure of the DS algorithm, LDSP is repeatedly used until the step in which the

minimum block distortion (MBD) occurs at the center point. The search pattern is

then switched from LDSP to SDSP as reaching to the final search stage. As the search

pattern is neither too small nor too big and the fact that there is no limit to the number

of steps, this algorithm can find global minimum very accurately. Among the five

checking points in SDSP, the position yielding the MBD provides the motion vector of the best

matching block.

Motion Estimation Algorithm in Video Coding 489

The DS algorithm is summarized as follows.

Step 1) The initial LDSP is centered at the origin of the search window, and the 9

checking points of LDSP are tested. If the MBD point calculated is located at the

center position, go to Step 3; otherwise, go to Step 2.

Step 2) The MBD point found in the previous search step is re-positioned as the

center point to form a new LDSP. If the new MBD point obtained is located at the

center position, go to Step 3; otherwise, recursively repeat this step.

Step 3) Switch the search pattern from LDSP to SDSP. The MBD point found in this

step is the final solution of the motion vector which points to the best matching block.

(a) (b) (c)

Fig. 4. (a) The corner point LDSP->LDSP. (b) The edge point LDSP->LDSP. (c) The centre

point LDSP->SDSP.

3.4 Adaptive Rood Pattern Search (ARPS)

ARPS[2] algorithm makes use of the fact that the general motion in a frame is usually

coherent, i.e. if the macro blocks around the current macro block moved in a

particular direction then there is a high probability that the current macro block will

also have a similar motion vector [3]. This algorithm uses the motion vector of the

macro block to its immediate left to predict its own motion vector. An example is

shown in Fig. 5. The predicted motion vector points to (3, -2). In addition to checking

the location pointed by the predicted motion vector, it also checks at a rood pattern

distributed points, as shown in Fig 5 where they are at a step size of S = Max ( X ,

Y ). It directly puts the search in an area where there is a high probability of finding a

good matching block. The point that has the least weight becomes the origin for

subsequent search steps, and the search pattern is changed to SDSP. The procedure

keeps on doing SDSP until least weighted point is found to be at the center of the

SDSP. The main advantage of this algorithm over Diamond Search is if the predicted

motion vector is (0, 0), it does not waste computational time in doing LDSP, it rather

directly starts using SDSP. Furthermore, if the predicted motion vector is far away

from the center, then again ARPS save on computations by directly jumping to that

vicinity and using SDSP, where as DS[4] takes its time doing LDSP. Thus ARPS

about two to three times faster than that of the diamond search (DS), and our method

490 V. Bafna and M.M. Mushrif

even achieves higher peak signal-to-noise ratio (PSNR) particularly for those video

sequences containing large and/or complex motion contents. Care has to be taken to

not repeat the computations at points that were checked earlier. For macro blocks in

the first column of the frame, rood pattern step size is fixed at 2 pixels.

Predicted

Step Size

Fig. 5. Adaptive Rood Pattern: The predicted motion vector is (3,-2), and the step size S = Max

( 3-2 ) = 3

4 Results

Foreman video sequence with a distance of 2 between current frame and reference

frame was used to generate the frame-by-frame results of the algorithms. Fig.6 shows

reference frame, current frame, compensated frame & residue after applying TSS

Reference frame Current frame

Compensated frame Residual frame

Fig. 6. Three Step search algorithm to foreman.avi

Motion Estimation Algorithm in Video Coding 491

Computations performance for Foreman Sequence

200

Exhau Sear

180 Three step sear

Adap rood sear

0 *-**-**-**-**-**

Frame number

Fig. 7. Computation performance of foreman.avi for three algorithm

PSNR performance for Foreman Sequence

Exhau Sear

83 Three step sear

Adap rood sear

0 *-**-**-**-**-**

Frame number

Fig. 8. PSNR performance of foreman.avi sequence for three algorithm

algorithm. Fig.7 shows a plot of the average number of searches required per macro

block for the Foreman sequence using the 3 fast block matching algorithms. The

PSNR comparison of the compensated images generated using the algorithms is

shown in Fig 8. The results are extremely similar to the results of [2] and [3].

Full Search (ES) is guaranteed to find minimum MAD in search window but it is

computationally intensive since the energy measure is calculated at every one of the 255

(2p+1)2 locations. The TSS is simpler than Exhaustive search as only 25 searches

492 V. Bafna and M.M. Mushrif

compared with 255 searches. But TSS do not perform as well as Full Search(ES). ARPS

come close to the PSNR results of ES as well as computations are 2 less computation

compared to TSS.

5 Conclusion

Block matching techniques are the most popular and efficient of the various motion

estimation techniques. Residue can be used to further improve our algorithm. In

future fractional pixel motion estimation algorithm can be used to enhance the coding

efficiency.

In the entire motion based video compression process motion estimation is the

most computationally expensive and time-consuming process.

References

1. Richardson, I.E.G.: H.264 and MPEG-4 VIDEO COMPRESSION, Video Coding for Next

Generation Multimedia. Ch.3,5, & 6. John Wiley & Sons Ltd, West Sussex, England (2003)

2. Block Matching Algorithms For Motion Estimation Aroh Barjatya, Student Member, IEEE,

DIP 6620 Spring, Paper (2004)

3. Nie, Y., Ma, K.-K.: Adaptive Rood Pattern Search for Fast Block-Matching Motion

Estimation. IEEE Trans. Image Processing 11(12), 1442 1448 (2002)

4. Zhu, S., Ma, K.-K.: A New Diamond Search Algorithm for Fast Block-Matching Motion

Estimation. IEEE Trans. Image Processing 9(2), 287 290 (2000)

5. Wang, H., Mersereau, R.: Fast Algorithms for the Estimation of Motion Vectors. IEEE

Transactions on Image Processing 8(3), 435 438 (1999)

Contact this candidate