Motion Estimation Algorithm in Video Coding
Vibha Bafna1 and M.M. Mushrif2
1
ECE Dept, G.H.Raisoni College of Engineering, Nagpur, India
*********@******.***
2
ECE Dept, Yaswantrao College of Engineering, Nagpur, India
*************@*****.***
Abstract. This paper is a review of the block matching algorithms used for the
motion estimation in video compression to remove the temporal redundancy
(i.e. interprediction). It implements and compares three different types of block
matching algorithms that range from the very basic Exhaustive Search to the
fast adaptive algorithms like Adaptive Rood Pattern Search. It can be used with
common video coding standards such as H.263 and H.264.
1 Introduction
DATA compression is the process of reducing the redundancy in data representation
in order to achieve savings in storage and communication costs. Due to limited
channel bandwidth and stringent requirements of real-time video playback, video
coding is an indispensable process, for many visual communication applications and
always requires a very high compression ratio. The large amount of temporal
correlation called temporal redundancy from the compression viewpoint, between
adjacent frames in a video sequence, requires to be properly identified and eliminated
to achieve this objective. An effective and popular method to reduce the temporal
redundancy, called block-matching motion estimation (BMME), has been widely
adopted in various video coding standards, such as H.261, H.263, H.264, and in any
motion-compensated video coding technique. Therefore, fast and accurate block-
based search technique is highly desirable to assure much reduced processing delay
while maintaining good reconstructed image quality. In either standard the basic flow
of the compression decompression process is largely the same and is shown in Fig. 1.
In Fig.1 the encoding side estimates the motion in the current frame with respect to
a previous frame. A motion compensated image for the current frame is then created
that is built of blocks of image from the previous frame. The motion vectors for
blocks used for motion estimation are transmitted, as well as the difference of the
compensated image with the current frame (residue) is also JPEG encoded and sent.
The encoded image that is sent is then decoded at the encoder and used as a reference
frame for the subsequent frames.
The decoder reverses the process and creates a full frame. The whole idea behind
motion estimation based video compression is to save on bits by sending JPEG
encoded difference images which have less energy and can be highly compressed as
compared to sending a full frame that is JPEG encoded. It should be noted that the
B. Apolloni et al. (Eds.): KES 2007/WIRN 2007, Part I, LNAI 4692, pp. 485 492, 2007.
Springer-Verlag Berlin Heidelberg 2007
486 V. Bafna and M.M. Mushrif
first frame is always sent full, and so are some other frames that might occur at some
regular interval (like every 7th frame). The standards do not specify this and this
might change with every video being sent based on the dynamics of the video.
+
+
Current Image Image Decoded
frame encoder Decoder
Fig. 1. Video compression process flow in H.26x
This paper implements and evaluates the fundamental block matching algorithms.
The algorithms that have been implemented are Exhaustive Search (ES), Three Step
Search (TSS), and Adaptive Rood Pattern Search (ARPS).
2 Methodology
Motion estimation[1](ME) is the most time-consuming process. The supposition
behind motion estimation is that the patterns corresponding to objects and background
in a frame of video sequence move within the frame to form corresponding objects on
the subsequent frame. For block matching the current frame is divided into a macro
blocks that are then compared with corresponding block and its adjacent neighbors in
the previous frame to create a vector that stipulates the movement of a macro block
from one location to another in the previous frame. The search area for a good macro
block match is p pixels on all fours sides of the corresponding macro block in
previous frame. Larger motions require a larger p, and the larger the p (search
parameter) the more computationally expensive the process of motion estimation
becomes. The idea is represented in Fig 2.
The matching of one macro block with another is based on the output of a cost
function. The macro block that results in the least cost is the one that matches the
closest to current block. There are various cost functions, of which the most popular
and less computationally expensive is Mean Squared Error (MSE) given by equation
(1). Another cost function is Mean Absolute Difference (MAD) given by equation
(2).Sum of absolute error is given by equation (3).
Motion Estimation Algorithm in Video Coding 487
N-1 N-1
MSE = 1 (Cij-Rij) 2
(1)
N2 i=0 j=0
N-1 N-1
MAD = 1 (Cij-Rij) (2)
N2 i=0 j=0
N-1 N-1
(Cij-Rij)
SAE = (3)
i=0 j=0
where N is the side of the macro bock, Cij and Rij are the pixels being compared in
current macro block and reference. Peak-Signal-to-Noise-Ratio (PSNR) given by
equation (4) characterizes the motion compensated image that is created by using
motion vectors and macro clocks from the reference frame.
PSNR = 10 Log10 [(Peak to peak value of original data)2]
(4)
MSE
3 Design Approach: Algorithms
3.1 Exhaustive Search (ES)
ES[1&2] algorithm, also known as Full Search (raster scan), is the most
computationally expensive block matching algorithm of all. This algorithm calculates
the cost function at each possible location in the search window. As a result of which
it finds the best possible match and gives the highest PSNR amongst any block
matching algorithm. The obvious disadvantage to ES is that the larger the search
window gets the more computations it requires.
Search Block
16
16 p=7
Current
16 Macro
Block
p=7
Fig. 2. Block Matching a macro block of side 16 pixels and a search parameter p of size 7
pixels
488 V. Bafna and M.M. Mushrif
3.2 Three Step Search (TSS)
TSS[2&5] is one of the earliest fast block matching algorithms. The general idea is
represented in Figure 3. It starts with the search location at the center and sets the
step size S = 4, for a usual search parameter value of 7. It then searches at eight
locations +/- S pixels around location (0, 0). From these nine locations searched so far
it picks the one giving least cost and makes it the new search origin. It then sets the
new step size S = S/2, and repeats similar search for two more iterations until S = 1.
At that point it finds the location with the least cost function and the macro block at
that location is the best match. The calculated motion vector is then saved for
transmission. It gives a flat reduction in computation by a factor of 9. So that for
p = 7, ES will compute cost for 225 macro blocks whereas TSS computes cost for 25
macro blocks.
Legend First step Second step Third Step
--
Fig. 3. Three Step Search procedure. The motion vector is (5, -3).
3.3 Diamond Search (DS)
The Diamond Search[4] algorithm employs two search patterns as shown in Fig. 4.
The first pattern, called large diamond search pattern (LDSP), comprises nine
checking points from which eight points surround the center one to compose a
diamond shape (_). The second pattern consisting of five checking points forms a
smaller diamond shape,called small diamond search pattern (SDSP).In the searching
procedure of the DS algorithm, LDSP is repeatedly used until the step in which the
minimum block distortion (MBD) occurs at the center point. The search pattern is
then switched from LDSP to SDSP as reaching to the final search stage. As the search
pattern is neither too small nor too big and the fact that there is no limit to the number
of steps, this algorithm can find global minimum very accurately. Among the five
checking points in SDSP, the position yielding the MBD provides the motion vector of the best
matching block.
Motion Estimation Algorithm in Video Coding 489
The DS algorithm is summarized as follows.
Step 1) The initial LDSP is centered at the origin of the search window, and the 9
checking points of LDSP are tested. If the MBD point calculated is located at the
center position, go to Step 3; otherwise, go to Step 2.
Step 2) The MBD point found in the previous search step is re-positioned as the
center point to form a new LDSP. If the new MBD point obtained is located at the
center position, go to Step 3; otherwise, recursively repeat this step.
Step 3) Switch the search pattern from LDSP to SDSP. The MBD point found in this
step is the final solution of the motion vector which points to the best matching block.
(a) (b) (c)
Fig. 4. (a) The corner point LDSP->LDSP. (b) The edge point LDSP->LDSP. (c) The centre
point LDSP->SDSP.
3.4 Adaptive Rood Pattern Search (ARPS)
ARPS[2] algorithm makes use of the fact that the general motion in a frame is usually
coherent, i.e. if the macro blocks around the current macro block moved in a
particular direction then there is a high probability that the current macro block will
also have a similar motion vector [3]. This algorithm uses the motion vector of the
macro block to its immediate left to predict its own motion vector. An example is
shown in Fig. 5. The predicted motion vector points to (3, -2). In addition to checking
the location pointed by the predicted motion vector, it also checks at a rood pattern
distributed points, as shown in Fig 5 where they are at a step size of S = Max ( X ,
Y ). It directly puts the search in an area where there is a high probability of finding a
good matching block. The point that has the least weight becomes the origin for
subsequent search steps, and the search pattern is changed to SDSP. The procedure
keeps on doing SDSP until least weighted point is found to be at the center of the
SDSP. The main advantage of this algorithm over Diamond Search is if the predicted
motion vector is (0, 0), it does not waste computational time in doing LDSP, it rather
directly starts using SDSP. Furthermore, if the predicted motion vector is far away
from the center, then again ARPS save on computations by directly jumping to that
vicinity and using SDSP, where as DS[4] takes its time doing LDSP. Thus ARPS
about two to three times faster than that of the diamond search (DS), and our method
490 V. Bafna and M.M. Mushrif
even achieves higher peak signal-to-noise ratio (PSNR) particularly for those video
sequences containing large and/or complex motion contents. Care has to be taken to
not repeat the computations at points that were checked earlier. For macro blocks in
the first column of the frame, rood pattern step size is fixed at 2 pixels.
Predicted
Step Size
Fig. 5. Adaptive Rood Pattern: The predicted motion vector is (3,-2), and the step size S = Max
( 3-2 ) = 3
4 Results
Foreman video sequence with a distance of 2 between current frame and reference
frame was used to generate the frame-by-frame results of the algorithms. Fig.6 shows
reference frame, current frame, compensated frame & residue after applying TSS
Reference frame Current frame
Compensated frame Residual frame
Fig. 6. Three Step search algorithm to foreman.avi
Motion Estimation Algorithm in Video Coding 491
Computations performance for Foreman Sequence
200
Exhau Sear
180 Three step sear
Adap rood sear
Frame number
Fig. 7. Computation performance of foreman.avi for three algorithm
PSNR performance for Foreman Sequence
84
Exhau Sear
83 Three step sear
Adap rood sear
Frame number
Fig. 8. PSNR performance of foreman.avi sequence for three algorithm
algorithm. Fig.7 shows a plot of the average number of searches required per macro
block for the Foreman sequence using the 3 fast block matching algorithms. The
PSNR comparison of the compensated images generated using the algorithms is
shown in Fig 8. The results are extremely similar to the results of [2] and [3].
Full Search (ES) is guaranteed to find minimum MAD in search window but it is
computationally intensive since the energy measure is calculated at every one of the 255
(2p+1)2 locations. The TSS is simpler than Exhaustive search as only 25 searches
492 V. Bafna and M.M. Mushrif
compared with 255 searches. But TSS do not perform as well as Full Search(ES). ARPS
come close to the PSNR results of ES as well as computations are 2 less computation
compared to TSS.
5 Conclusion
Block matching techniques are the most popular and efficient of the various motion
estimation techniques. Residue can be used to further improve our algorithm. In
future fractional pixel motion estimation algorithm can be used to enhance the coding
efficiency.
In the entire motion based video compression process motion estimation is the
most computationally expensive and time-consuming process.
References
1. Richardson, I.E.G.: H.264 and MPEG-4 VIDEO COMPRESSION, Video Coding for Next
Generation Multimedia. Ch.3,5, & 6. John Wiley & Sons Ltd, West Sussex, England (2003)
2. Block Matching Algorithms For Motion Estimation Aroh Barjatya, Student Member, IEEE,
DIP 6620 Spring, Paper (2004)
3. Nie, Y., Ma, K.-K.: Adaptive Rood Pattern Search for Fast Block-Matching Motion
Estimation. IEEE Trans. Image Processing 11(12), 1442 1448 (2002)
4. Zhu, S., Ma, K.-K.: A New Diamond Search Algorithm for Fast Block-Matching Motion
Estimation. IEEE Trans. Image Processing 9(2), 287 290 (2000)
5. Wang, H., Mersereau, R.: Fast Algorithms for the Estimation of Motion Vectors. IEEE
Transactions on Image Processing 8(3), 435 438 (1999)