Viseme

From Wikipedia, the free encyclopedia

A viseme is a basic unit of speech in the visual domain that corresponds to the phoneme (which is the basic unit of speech in the acoustic domain). It describes the particular facial and oral movements that occur alongside the voicing of phonemes.

Phonemes and visemes do not share a one-to-one correspondence; often, several phonemes share the same viseme. In other words, several phonemes look the same on the face when produced, such as /k/, /g/, /ŋ/, (viseme: /k/), or /ʧ/, /ʃ/, /ʤ/, /ʒ/ (viseme: /ch/). Conversely, some sounds which are hard to distinguish acoustically are clearly distinguished by the face (Chen 2001). This is demonstrated by the more frequent mishearing of words on the telephone than in person. Some linguists have argued that speech is best understood as bimodal (aural and visual), and comprehension can be compromised if one of these two domains is absent (McGurk and MacDonald 1976). The comprehension of speech by visemes alone is known as speechreading or "lip reading".

Applications for the study of visemes includes speech processing, speech recognition and computer facial animation.

[edit] References

  • Chen, T. (1998, May). Audio-visual integration in multi-modal communication. Proceedings of the IEEE 86, 837–852.
  • Chen, T. (2001). Audiovisual speech processing. IEEE Signal Processing Magazine, 9–31.
  • McGurk, H. and J. MacDonald (1976, December). Hearing lips and seeing voices. Nature, 746–748.
  • Patrick Lucey, Terrence Martin and Sridha Sridharan. 2004. Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments. Presented at Tenth Australian International Conference on Speech Science & Technology, Macquarie University, Sydney, 8th-10th December, 2004. Article online (PDF document)