Motion estimation
From Wikipedia, the free encyclopedia
It has been suggested that this article or section be merged with Video tracking. (Discuss) |
Motion estimation is the process of determining motion vectors that describe the transformation from one 2D image to another; usually from adjacent frames in a video sequence. It is an ill-posed problem as the motion is in three dimensions but the images are a projection of the 3D scene onto a 2D plane. The motion vectors may relate to the whole image (global motion estimation) or specific parts, such as rectangular blocks, arbitrary shaped patches or even per pixel. The motion vectors may be represented by a translational model or many other models that can approximate the motion of a real video camera, such as rotation and translation in all three dimensions and zoom.
Closely related to motion estimation is optical flow, where the vectors correspond to the perceived movement of pixels. In motion estimation an exact 1:1 correspondence of pixel positions is not a requirement.
Applying the motion vectors to an image to synthesise the transformation to the next image is called Motion compensation. The combination of motion estimation and motion compensation is a key part of video compression as used by MPEG 1, 2 and 4 as well as many other video codecs.
Contents |
[edit] Algorithms
The methods for finding motion vectors can be categorised into pixel based methods ("direct") and feature based methods ("indirect"). A famous debate resulted in two papers from the opposing factions being produced to try to establish a conclusion[1][2].
[edit] Direct Methods
- Block matching
- Phase correlation and frequency domain methods
- Pixel recursive algorithms
- MAP/MRF type "Bayesian" estimators
[edit] Evaluation Metrics
In direct methods several evaluation metrics can be used.
- Mean Squared Error (MSE)
- Sum of Absolute Differences (SAD)
- Mean Absolute Difference (MAD)
- Sum of Squared Errors (SSE)
- etc
[edit] Indirect Methods
Indirect methods use features, such as Harris corners, and match corresponding features between frames, usually with a statistical function applied over a local or global area. The purpose of the statistical function is remove matches that do not correspond to the actual motion.
Statistical functions that have been successfully used include RANSAC.