Structure from motion

From Wikipedia, the free encyclopedia

In computer vision, structure from motion refers to the process of building a 3D model from video of a moving rigid object. Algorithmically, this is very similar to stereo vision where a 3D model is built from 2 simultaneous images of the same object. In both cases, multiple images are taken of the same object and corresponding features are used to compute 3D locations. In structure from motion, the images are taken at different points in time, while in stereo vision the images are taken at different points in space.

This is related to the Kinetic Depth Effect in perception whereby subjects viewing the shadow cast by a wire frame or other structure in rotation, perceive the full three-dimensional structure of the object, whereas when viewing the shadow of a static object they perceive only its two-dimensional projection.

Colloquially, structure from motion is sometimes used for any 3D reconstruction built from 2D images of a rigid (or static) object. Because of this colloquial usage, structure from motion has significant overlap with stereo vision.

In either setting, the 3D shape of the object is reconstructed by identifying corresponding points in different images. The 3D location of these corresponding points can be computed if information about the camera is known (commonly referred to as camera calibration).