Articulated body pose estimation

From Wikipedia, the free encyclopedia

This article or section needs copy editing for grammar, style, cohesion, tone or spelling.
You can assist by editing it now. A how-to guide is available. (October 2007)

Articulated Body Pose Estimation in Computer Vision is a study of algorithms and systems to recover the pose of an articulated^[1] body which consists of joints and rigid parts using image-based observations. It is one of the most enduring problems in Computer Vision because of the complexity of the models which relate observation with pose, and because of the variety of situations in which such a device would be useful.^[2]^[3]

The desire to develop accurate tether-less, vision-based articulated body pose estimation systems. These bodies may be the human body, hand, or even other creatures. Such a system have several foreseeable applications, including

marker-less motion capture for human-computer interfaces
physiotherapy
3D animation
ergonomics studies
robot control
visual surveillance

One of the major difficulties in recovering pose from images is the high number of degrees-of-freedom (DOF) in movement that is to be recovered. Any rigid object requires 6 DOF to fully describe its pose. Each additional rigid object connected to it adds at least 1 DOF. A human body contains no less than 10 large body parts, equating to more than 20 DOFs. The difficulty is compounded by with the problem of self-occlusion, where body parts occlude each other depending on the configuration. Other challenges involve dealing with varying illumination which affect appearance,varying subject attire or body type, required camera configuration, required computation time.

The typical system involves the model-based approach. An observation is made and provided as input to the model to generate pose estimates. With regards to the observation, different kinds of sensors have been explored:

Visible wavelength imagery
Long-wave thermal infrared imagery
Time-of-flight imagery
Laser range scanner imagery

The various sensors produce intermediate representation that is directly used by the model. These representations include

Image Appearance
Voxel (volume element) reconstruction
3D surface point cloud
3D surface mesh

[edit] Related Technologies

A commercially successful but specialized computer vision-based articulated body pose estimation techniques is optical motion capture. This approach involves placing markers on the individual at strategic locations to capture the 6 degrees-of-freedom pose of each body part.