Motion capture

From Wikipedia, the free encyclopedia

It has been suggested that Performance capture be merged into this article or section. (Discuss)

Motion capture, Motion Tracking or Mocap, is a technique of digitally recording movements for entertainment, sports and medical applications.

1 Methods and Systems
2 The procedure
3 Advantages
4 Disadvantages
5 Applications
6 Related techniques
7 Programs
8 See also
9 External links

[edit] Methods and Systems

Motion tracking or motion capture started as an analysis tool in biomechanics research, and expanded into education, training, sports and recently computer animation for cinema and video games as the technology has matured. A performer wears markers near each joint to identify the motion by the positions or angles between the markers. Acoustic, inertial, LED, magnetic or reflective markers, or combinations of any of these, are tracked, optimally at least two times the rate of the desired motion, to submillimeter positions. The motion capture computer software records the positions, angles, velocities, accelerations and impulses, providing an accurate digital representation of the motion.

In entertainment applications this can reduce the costs of animation which otherwise requires the animator to draw each frame, or with more sophisticated software, key frames which are interpolated by the software. Motion capture saves time and creates more natural movements than manual animation, but is limited to motions that are anatomically possible. Some applications might require additional impossible movements like animated super hero martial arts or stretching and squishing that are not possible with real actors.

In biomechanics, sports and training, real time data can provide the necessary information to diagnose problems or suggest ways to improve performance, requiring motion capture technology to capture motions up to 140 miles per hour for a golf swing.

Optical systems triangulate the 3D position of a marker between one or more cameras calibrated to provide overlapping projections. Tracking a large number of markers or multiple performers or expanding the capture area is accomplished by the addition of more cameras. These systems produce data with 3 degrees of freedom for each marker, and rotational information must be inferred from the relative orientation of three or more markers; for instance shoulder, elbow and wrist markers providing the angle of the elbow. A newer technique discussed below uses higher resolution linear detectors to derive the one dimensional positions, requiring more sensors and more computations, but providing higher resolutions (sub millimeter down to 10 micrometers time averaged) and speeds than possible using area arrays.

A dancer wearing a suit used in an optical motion capture system

Passive optical systems use reflective markers illuminated from strobes on the camera and triangulate each marker from its relative location on a 2D map. Data can be cleaned up with the aid of kinematic constraints and predictive gap filling algorithms. Passive systems typically use sensors where the camera captures an image of the scene, reduces it to bright spots and finds the centroid. Typically, 1.3 megapixel sensors run at frame rates up to 600,000,000 pixels per second divided by the resolution, so at 1.3 megapixels they can operate at 500 frames per second.

A high speed 4 megapixel sensor costs around $1,000 USD and can run at 640,000,000 pixels per second divided by the applied resolution. By decreasing the resolution down to 640 x 480, these cameras can sample at 2,000 frames per second, but then trade off spatial resolution for temporal resolution causing blurring or jitter which requires heavy filtering to correct. At full resolution they run about 166 frames per second, but typically are run at 100 to 120 frames per second. A $100, low speed 4 megapixel detector has a bandwidth of about 40,000,000 pixels per second and is unsuitable for motion capture since the motion blur will cause errors requiring filtering causing unsatisfying results. With about 200 LED strobes synchronized to the CMOS sensor, the ease of combining a hundred dollars worth of LEDs to a $1,000 sensor has made these systems very popular.

Professional vendors have sophisticated constraint software to reduce problems from marker swapping since all markers appear identical. Unlike active marker systems and magnetic systems, passive systems do not require the user to wear wires or electronic equipment. Passive markers are usually spheres or hemispheres made of plastic or foam 25 to 3mm in diameter with special retroreflective tape. Actors must be careful not to touch the markers or get them dirty as this changes the reflective properties and cause errors. The markers must be visible away from the surface of the body, making them prone to being knocked off with any contact. This type of system can capture large numbers of markers at frame rates as high as 2000fps and high 3D accuracy.

Active marker systems have an advantage over passive in that there is no doubt about which marker is which. In general, the overall update rate drops as the marker count increases; 5000 frames per second divided by 100 markers would provide updates of 50 hertz. As a result, these systems are popular in the biomechanics market.

A high-resolution active marker system with 3,600 × 3,600 resolution at 480 hertz providing real time submillimeter positions.

Higher resolution active marker systems show more subtle movements by providing marker IDs in real time, modulating the output of the LED to differentiate each marker, (U.S. Patent 6,324,296 by PhaseSpace) allowing 32 markers to be on at the same time, eliminating marker swapping and providing much cleaner data than older technologies. Smart LEDs allow motion capture outdoors in direct sunlight, while providing the 3,600 × 3,600 or 12 megapixel resolution while capturing at 480 frames per second. The advantage of using active markers is intelligent processing allows higher speed and higher resolution of optical systems at a lower price. This higher accuracy and resolution requires more processing than older passive technologies, but the additional processing is done at the camera to improve resolution via a subpixel or centroid processing, providing both high resolution and high speed. By using newer processing and technology, these motion capture systems are about 1/3 the cost of older systems.

Magnetic systems calculate position and orientation by the relative magnetic flux of three orthogonal coils on both the transmitter and each receiver. The relative intensity of the voltage or current of the three coils allows these systems to calculate both range and orientation by meticulously mapping the tracking volume. Since the sensor output is 6DOF, useful results can be obtained with two-thirds the number of markers required in optical systems; one on upper arm and one on lower arm for elbow position and angle. The markers are not occluded by nonmetallic objects but are susceptible to magnetic and electrical interference from metal objects in the environment, like rebar (steel reinforcing bars in concrete) or wiring, which affect the magnetic field, and electrical sources such as monitors, lights, cables and computers. The sensor response is nonlinear, especially toward edges of the capture area. The wiring from the sensors tends to preclude extreme performance movements. The capture volumes for magnetic systems are dramatically smaller than they are for optical systems. With the magnetic systems, there is a distinction between “AC” and “DC” systems: one uses square pulses, the other uses sine wave pulses.

Mechanical motion capture systems directly track body joint angles and are often referred to as exo-skeleton motion capture systems, due to the way the sensors are attached to the body. A performer attaches the skeletal-like structure to their body and as they move so do the articulated mechanical parts, measuring the performer’s relative motion. Mechanical motion capture systems are real-time, relatively low-cost, free-of-occlusion, and wireless (untethered) systems that have unlimited capture volume. Typically, they are rigid structures of jointed, straight metal or plastic rods linked together with potentiometers that articulate at the joints of the body.

[edit] The procedure

In the motion capture session, the movements of one or more actors are sampled many times per second. High resolution optical motion capture systems can be used to sample body, facial and finger movement at the same time.

A motion capture session records only the movements of the actor, not his visual appearance. These movements are recorded as animation data which are mapped to a 3D model (human, giant robot, etc.) created by a computer artist, to move the model the same way. This is comparable to the older technique of rotoscope where the visual appearance of the motion of an actor was filmed, then the film used as a guide for the frame by frame motion of a hand-drawn animated character.

If desired, a camera can pan, tilt, or dolly around the stage while the actor is performing and the motion capture system can capture the camera and props as well. This allows the computer generated characters, images and sets, to have the same perspective as the video images from the camera. A computer processes the data and displays the movements of the actor, as inferred from the 3D position of each marker. If desired, a virtual or real camera can be tracked as well, providing the desired camera positions in terms of objects in the set.

A related technique match moving can derive 3D camera movement from a single 2D image sequence without the use of photogrammetry, but is often ambiguous below centimeter resolution, due to the inability to distinguish pose and scale characteristics from a single vantage point. One might extrapolate that future technology might include full-frame imaging from many camera angles to record the exact position of every part of the actor’s body, clothing, and hair for the entire duration of the session, resulting in a higher resolution of detail than is possible today.

After processing, the software exports animation data, which computer animators can associate with a 3D model and then manipulate using normal computer animation software. If the actor’s performance was good and the software processing was accurate, this manipulation is limited to placing the actor in the scene that the animator has created and controlling the 3D model’s interaction with objects.

[edit] Advantages

Mocap offers several advantages over traditional computer animation of a 3D model:

Mocap can take far fewer man-hours of work to animate a character for complex human movements.
Mocap can capture secondary animation that traditional animators might not have had the skill, or time to create. For example, a quick movement of the head by the actor might cause his hip to twist slightly. This nuance might be understood by a traditional animator but be too time consuming and difficult to accurately represent, but it is captured accurately by mocap. Incidentally, one of the hallmarks of rotoscope in traditional animation is just such secondary “business.”
Mocap can accurately capture difficult-to-model physical movement. For example, if the mocap actor does a backflip while holding nunchaku by the chain, both sticks of the nunchucks will be captured by the cameras moving in a realistic fashion. A traditional animator might not be able to physically simulate the movement of the sticks adequately due to other motions by the actor. Secondary motion such as the ripple of a body as an actor is punched or is punching requires both higher speed and higher resolution as well as more markers.
Mocap technology allows one actor to play multiple roles within a single film.

[edit] Disadvantages

On the negative side, mocap data requires special programs and time to manipulate once captured and processed, and if the data is wrong, it is often easier to throw it away and reshoot the scene rather than trying to manipulate the data. Many systems allow real time viewing of the data to decide if the take needs to be redone.

Another important point is that while it is common and comparatively easy to mocap a human actor in order to animate a biped model, applying motion capture to animals like horses can be difficult.

Motion capture equipment costs from fifty thousand dollars for 8 camera active marker systems to millions of dollars for passive marker systems, for the digital video cameras, lights, software, and staff to run a mocap studio, and this technology investment can become obsolete every few years as better software and techniques are invented.

Computer models that have a cartoony design will "break" when realistic human movement is applied to them unless the motion is correctly retargeted, which is an ongoing task in computer animation. For example, if a cartoony character has large, over-sized hands, these will intersect strangely with any other body part when the human actor brings them too close to his body.

Although motion capture produces "realistic" movement, hand animation often allows for stronger applications of traditional techniques like squash and stretch, secondary motion, and anticipation, creating characters with greater impact and personality.

[edit] Applications

Video games use motion capture for football, baseball and basketball players or the combat moves of a martial artist.

Movies use motion capture for CG effects, in some cases replacing traditional cell animation, and for completely computer-generated creatures, such as Gollum, Jar-Jar Binks, and King Kong.

Virtual Reality and Augmented Reality require real time input of the user’s position and interaction with their environment, requiring more precision and speed than older motion capture systems could provide. Noise and errors from low resolution or low speed systems, and overly smoothed and filtered data with long latency contribute to “simulator sickness” where the lag and mismatch between visual and vistibular cues and computer generated images caused nausea and discomfort.

High speed—high resolution active marker systems can provide smooth data at low latency, allowing real time visualization in virtual and augmented reality systems. The remaining challenge that is almost possible with powerful graphic cards is mapping the images correctly to the real perspectives to prevent image mismatch.

Motion capture technology is frequently used in digital puppetry systems to aid in the performance of computer generated characters in real-time.

[edit] Related techniques

Facial motion capture is utilized to record the complex movements in a human face, especially while speaking with emotion. This is generally performed with an optical setup using multiple cameras arranged in a hemisphere at close range, with small markers glued or taped to the actor’s face.

Performance capture is a further development of these techniques, where both body motions and facial movements are recorded. This technique was used in making of The Polar Express.

Inertial systems use devices such as accelerometers or gyroscopes to measure positions and angles. They are often used in conjunction with other systems to provide updates and global reference, since they only measure relative changes, not absolute position.

RF (radio frequency) positioning systems are becoming more viable as higher frequency RF devices allow greater precision than older RF technologies. The speed of light is 30 centimeters per nanosecond (billionth of a second), so a 10 gigahertz (billion cycles per second) RF signal enables an accuracy of about 3 centimeters. By measuring amplitude to a quarter wavelength, it is possible to improve the resolution down to about 8 mm. To achieve the resolution of optical systems, frequencies of 50 gigahertz or higher are needed, which are almost as line of sight and as easy to block as optical systems. Multipath and reradiation of the signal are likely to cause additional problems, but these technologies will be ideal for tracking larger volumes with reasonable accuracy, since the required resolution at 100 meter distances isn’t likely to be as high.

An alternative approach was developed where the actor is given an unlimited walking area through the use of a rotating sphere, similar to a hamster ball, which contains internal sensors recording the angular movements, removing the need for external cameras and other equipment. Even though this technology could potentially lead to much lower costs for mocap, the basic sphere is only capable of recording a single continuous direction. Additional sensors worn on the person would be needed to record anything more.

[edit] Programs

Imocap: by ILM, facial motion capture used in Pirates of the Caribbean 2, white dots on face
Mova Contour system: facial motion capture, phosphorescent green makeup won by actor
Image Metrics: facial motion capture, no makeup, no spandex suit, no special hardware