Motion compensation

From Wikipedia, the free encyclopedia

In video compression, Motion compensation describes a picture in terms of where each section of that picture came from, in a previous picture. This is often employed in video compression. It can be also used for deinterlacing. A video sequence consists of a number of pictures - usually called frames. Subsequent frames are very similar, thus containing a lot of redundancy. Removing this redundancy helps achieve the goal of better compression ratios.

A first approach would be to simply subtract a reference frame from a given frame. The difference is then called residual and usually contains less energy (or information) than the original frame. The residual can be encoded at a lower bit-rate with the same quality. The decoder can reconstruct the original frame by adding the reference frame again.

A more sophisticated approach is to approximate the motion of the whole scene and the objects of a video sequence. The motion is described by some parameters that have to be encoded in the bit-stream. The pixels of the predicted frame are approximated by appropriately translated pixels of the reference frame. This gives much better residuals than a simple subtraction. However, the bit-rate occupied by the parameters of the motion model must not become too large.

Usually, the frames are processed in groups. One frame (usually the first) is encoded without motion compensation just as a normal image. This frame is called I-frame (intra-coded frame, MPEG terminology) or I-picture. The other frames are called P-frames or P-pictures and are predicted from the I-frame or P-frame that comes (temporally) immediately before it. The prediction schemes are, for instance, described as IPPPP, meaning that a group consists of one I-frame followed by four P-frames.

Frames can also be predicted from future frames. The future frames then need to be encoded before the predicted frames and thus, the encoding order does not necessarily match the real frame order. Such frames are usually predicted from two directions, i.e. from the I- or P-frames that immediately precede or follow the predicted frame. These bidirectionally predicted frames are called B-frames. A coding scheme could, for instance, be IBBPBBPBBPBB.

Contents

[edit] Global motion compensation

In global motion compensation, the motion model basically reflects camera motions such as dolly (forward, backwards), track (left, right), boom (up, down), pan (left, right), tilt (up, down) and roll (along the view axis). It works best for still scenes without moving objects. There are several advantages of global motion compensation:

  • It models precisely the major part of motion usually found in video sequences with just a few parameters. The share in bit-rate of these parameters is negligible.
  • It does not partition the frames. This avoids artifacts at partition borders.
  • A straight line (in the time direction) of pixels with equal spatial positions in the frame corresponds to a continuously moving point in the real scene. Other MC schemes introduce discontinuities in the time direction.

However, moving objects are not sufficiently represented by global motion compensation. Thus, other methods are preferable.

[edit] Block Motion Compensation

In block motion compensation (BMC), the frames are partitioned in blocks of pixels (e.g. macroblocks of 16×16 pixels in MPEG). Each block is predicted from a block of equal size in the reference frame. The blocks are not transformed in any way apart from being shifted to the position of the predicted block. This shift is represented by a motion vector.

The motion vectors are the parameters of this motion model and have to be encoded into the bit-stream. As the motion vectors are not always independent (e.g. if two neighbouring blocks belong to the same moving object), they are usually encoded differentially to save bit-rate. This means that the difference of the motion vector and the neighbouring motion vector(s) encoded before is encoded. (The result of this differencing process is mathematically equivalent to a global motion compensation capable of panning.) An entropy codec can exploit the resulting statistical distribution of the motion vectors (around the zero vector).

It is possible to shift blocks by non-integer vectors, which is called sub-pixel precision. This is done by interpolating the pixel's values. Usually, the precision of the motion vectors is increased by one bit: half-pixel precision. The computational expense of sub-pixel precision is much higher, due to the interpolation required.

The main disadvantage of block motion compensation is that it introduces discontinuities at the block borders (blocking artifacts). These artifacts the form of sharp horizontal and vertical edges which are easily spotted by the human eye and produce ringing effects (large coefficients in high frequency sub-bands) in the Fourier-related transform used for transform coding of the residual frames.

Block motion compensation divides up the current frame into non-overlapping blocks, and the motion compensation vector tells where those blocks come from (a common misconception is that the previous frame is divided up into non-overlapping blocks, and the motion compensation vectors tell where those blocks move to). The source blocks typically overlap in the source frame. Some video compression algorithms assemble the current frame out of pieces of several different previously-transmitted frames.

[edit] Variable Block-Size Motion Compensation

Variable block-size motion compensation (VBSMC) is the use of BMC with the ability for the encoder to dynamically select the size of the blocks. When coding video, the use of larger blocks can reduce the number of bits needed to represent the motion vectors, while the use of smaller blocks can result in a smaller amount of prediction residual information to encode. Older designs such as H.261 and MPEG-1 video typically use a fixed block size, while newer ones such as H.263, MPEG-4 Part 2, H.264/MPEG-4 AVC, and VC-1 give the encoder the ability to dynamically choose what block size will be used to represent the motion.

[edit] Overlapped Block Motion Compensation

Overlapped block motion compensation (OBMC) is a good solution to these problems because it not only increases prediction accuracy but also avoids blocking artifacts. When using OBMC, blocks are typically twice as big in each dimension and overlap quadrant-wise with all 8 neighbouring blocks. Thus, each pixel belongs to 4 blocks. In such a scheme, there are 4 predictions for each pixel which are summed up to a weighted mean. For this purpose, blocks are associated with a window function that has the property that the sum of 4 overlapped windows is equal to 1 everywhere.

Studies of methods for reducing the complexity of OBMC have shown that the contribution to the window function is smallest for the diagonally-adjacent block. Reducing the weight for this contribution to zero and increasing the other weights by an equal amount leads to a substantial reduction in complexity without a large penalty in quality. In such a scheme, each pixel then belongs to 3 blocks rather than 4, and rather than using 8 neighboring blocks, only 4 are used for each block to be compensated. Such a scheme is found in the H.263 Annex F Advanced Prediction mode

[edit] Motion Estimation

Motion estimation (BME, OBME) is the process of finding optimal or near-optimal motion vectors. The amount of prediction error for a block is often measured using the mean squared error (MSE) or sum-of-absolute-differences (SAD) between the predicted and actual pixel values over all pixels of the motion-compensated region.

To find optimal motion vectors, one basically has to calculate the block prediction error for each motion vector within a certain search range and pick the one that has the best compromise between the amount of error and the number of bits needed for motion vector data. The motion estimation technique of simply exhaustively testing all possible motion representations to perform such an optimization is called full search. A faster method, which is sub-optimal with respect to rate-distortion, is to use a coarse search grid for a first approximation and to refine the grid in the surrounding of this approximation in further steps. One common method is the 3-step search, which uses search grids of 3×3 motion vectors and 3 refinement steps to get an overall search range of 15×15 pixel.

For OBME, the pixel-wise prediction errors of a block and its overlapping neighbouring blocks have to be weighted and summed according to the window function before being squared. As in the process of successively finding/refining motion vectors some neighbouring MVs are not known yet, the corresponding prediction errors can be ignored (not added) as a sub-optimal solution.

The major disadvantages of OBMC are increased computational complexity of OBME, and the fact that prediction errors and, thus, also the optimal motion vectors depend on neighbouring blocks/motion vectors. Therefore, there is no algorithm with polynomial computational complexity that guarantees optimal motion vectors. However, there are near-optimal iterative and non-iterative methods with acceptable computational complexity.

[edit] External links

In other languages