Computer facial animation
From Wikipedia, the free encyclopedia
Computer facial animation is primarily an area of computer graphics that encapsulates models and techniques for generating and animating images of the human head and face. Due to its subject and output type, it is also related to many other scientific and artistic fields from psychology to traditional animation. The importance of human faces in verbal and non-verbal communication and advances in computer graphics hardware and software have caused considerable scientific, technological, and artistic interests in computer facial animation.
Although development of computer graphics methods for facial animation started in the early 1970s, major achievements in this field are more recent and happened since the late 1980s.
Computer facial animation includes a variety of techniques from morphing to three-dimensional modeling and rendering. It has become well-known and popular through animated feature films and computer games but its applications include many more areas such as communication, education, scientific simulation, and agent-based systems (for example online customer service representatives).
Contents |
[edit] History
Human facial expression has been the subject of scientific investigation for more than one hundred years. Study of facial movements and expressions started from a biological point of view. After some older investigations, i.e. by John Bulwer in late 1640s, Charles Darwin’s book The Expression of the Emotions in Men and Animals can be considered a major departure for modern research in behavioural biology.
More recently, one of the most important attempts to describe facial activities (movements) was Facial Action Coding System (FACS). Introduced by Ekman and Friesen in 1978, FACS defines 64 basic facial Action Units (AUs). A major group of these Action Units represent primitive movements of facial muscles in actions such as raising brows, winking, and talking. Eight AUs are for 3D head movements, i.e. turning and tilting left and right and going up, down, forward and backward. FACS has been successfully used for describing desired movements of synthetic faces and also in tracking facial activities.
Computer based facial expression modeling and animation is not a new endeavor. The earliest work with computer based facial representation was done in the early 1970s. The first three-dimensional facial animation was created by Parke in 1972. In 1973, Gillenson developed an interactive system to assemble and edit line drawn facial images. And in 1974, Parke developed a parameterized three-dimensional facial model.
The early 1980s saw the development of the first physically-based muscle-controlled face model by Platt and the development of techniques for facial caricatures by Brennan. In 1985, the short animated film ``Tony de Peltrie’’ was a landmark for facial animation. In it for the first time computer facial expression and speech animation were a fundamental part of telling the story.
The late 1980s saw the development of a new muscle-based model by Waters, the development of an abstract muscle action model by Magnenat-Thalmann and colleagues, and approaches to automatic speech synchronization by Lewis and by Hill. The 1990s have seen increasing activity in the development of facial animation techniques and the use of computer facial animation as a key storytelling component as illustrated in animated films such as Toy Story, Antz, Shrek, and Monsters, Inc, and computer games such as Sims. Casper (1995) is a milestone in this period, being the first movie with a lead actor produced exclusively using digital facial animation (Toy Story was released later the same year).
The sophistication of the films increased after 2000. In The Matrix Reloaded and Matrix Revolutions dense optical flow from several high-definition cameras was used to capture realistic facial movement at every point on the face. Polar Express used a large Vicon system to capture upward of 150 points. Although these systems are automated, a large amount of manual clean-up effort is still needed to make the data usable. Another milestone in facial animation was reached by The Lord of the Rings where a character specific shape base system was developed. Mark Sagar pioneered the use of FACS in entertainment facial animation, and FACS based systems developed by Sagar were used on Monster House, King Kong, and other films.
In 2006, Face Robot has been developed, being the first commercial software to deal with the problem of Facial Animation. Face Robot approaches the problem using a non linear solver. It can be procedurally applied to a human face and animation retargeted across faces. It can also be directly manipulated, hand animated or driven by motion capture data.
[edit] Techniques
[edit] 2D
Two-dimensional methods for facial animation are based on applying image transformation to existing photographs. The most common technique in 2D facial animation is morphing and its variations. Morphing involves a pair of images (morph source and morph target) and creating a series of in-between images that show a transition from source to target (interpolation). Morph source and morph target images are animation keyframes. In the case of facial animation, they can be visemes. A set of such images can allow animating a talking head as shown in the top row of 2D facial animation figure. A more complicated situation is when only one image (e.g. a rest position of face) exists. In such cases, image processing techniques can be used to first create the morph target (see the bottom row of the figure).
[edit] 3D
Three-dimensional head models provide the most powerful means of generating computer facial animation. One of the earliest works on computerized head models for graphics and animation was done by Parke. The model was a mesh of 3D points controlled by a set of conformation and expression parameters. The former group controls the relative location of facial feature points such as eye and lip corners. Changing these parameters can re-shape a base model to create new heads. The latter group of parameters (expression) are facial actions that can be performed on face such as stretching lips or closing eyes. This model was extended by other researchers to include more facial features and add more flexibility. Different methods for initializing such “generic” model based on individual (3D or 2D) data have been proposed and successfully implemented. The parameterized models are effective ways due to use of limited parameters, associated to main facial feature points. The MPEG-4 standard defines a minimum set of parameters for facial animation [1].
Animation is done by changing parameters over time. Facial animation is approached in different ways, traditional techniques include
- shapes/morph targets,
- bones/cages,
- skeleton-muscle systems,
- motion capture on points on the face and
- knowledge based solver deformations.
1. Shape based systems offer a fast playback as well as a high degree of fidelity of expressions. The technique involves modelling portions of the face mesh to approximate expressions and visemes and then blending the different sub meshes, known as morph targets or shapes. Perhaps the most accomplished character using this technique was Gollum, from The Lord of the Rings. Drawbacks of this technique are that they involve intensive manual labor, are specific to each character and must be animated by slider parameter tables.
2. Skeletal Muscle systems, physically-based head models form another approach in modeling the head and face. Here the physical and anatomical characteristics of bones, tissues, and skin are simulated to provide a realistic appearance (e.g. spring-like elasticity). Such methods can be very powerful for creating realism but the complexity of facial structures make them computationally expensive, and difficult to create. Considering the effectiveness of parameterized models for communicative purposes (as explained in the next section), it may be argued that physically-based models are not a very efficient choice in many applications. This does not deny the advantages of physically-based models and the fact that they can even be used within the context of parameterized models to provide local details when needed. Waters, Terzopoulos, Kahler, and Seidel (among others) have developed physically-based facial animation systems.
3. 'Envelope Bones' or 'Cages' are commonly used in games. They produce a simple and fast models, but are not prone to portray subtlety.
4. Motion capture uses cameras placed around a subject. The subject is generally fitted either with reflectors (passive motion capture) or sources (active motion capture) that precisely determine the subject's position in space. The data recorded by the cameras is then digitized and converted into a three-dimensional computer model of the subject. Until recently, the size of the detectors/sources used by motion capture systems made the technology inappropriate for facial capture. However, miniaturization and other advancements by companies such as PhaseSpace Inc. have made motion capture a viable tool for computer facial animation. Facial motion capture was used extensively in Polar Express where hundreds of motion points were captured. This film was very accomplished and while it attempted to recreate realism, it was criticised for having fallen in the 'uncanny valley', the realm where animation realism is sufficient for human recognition but fails to convey the emotional message. The main difficulties of motion capture are the quality of the data which may include vibration as well as the retargeting of the geometry of the points.
5. Deformation Solver Face Robot.
[edit] Face Animation Languages
Many face animation languages are used to describe the content of facial animation. They can be input to a compatible "player" software which then creates the requested actions. Face animation languages are closely related to other multimedia presentation languages such as SMIL and VRML. Due to the popularity and effectiveness of XML as a data representation mechanism, most face animation languages are XML-based. For instance, this is a sample from Virtual Human Markup Language (VHML):
<vhml> <person disposition="angry"> First I speak with an angry voice and look very angry, <surprised intensity="50"> but suddenly I change to look more surprised. </surprised> </person> </vhml>
More advanced languages allow decision-making, event handling, and parallel and sequential actions. Following is an example from Face Modeling Language (FML):
<fml> <act> <par> <hdmv type="yaw" value="15" begin="0" end="2000" /> <expr type="joy" value="-60" begin="0" end="2000" /> </par> <excl event_name="kbd" event_value="" repeat="kbd;F3_up" > <hdmv type="yaw" value="40" begin="0" end="2000" event_value="F1_up" /> <hdmv type="yaw" value="-40" begin="0" end="2000" event_value="F2_up" /> </excl> </act> </fml>