XHTML+Voice
XHTML+Voice (commonly X+V) is an XML language for describing multimodal user interfaces. The two essential modalities are visual and auditory. Visual interaction is defined like most current web pages via XHTML. Auditory components are defined by a subset of Voice XML. Interfacing the voice and visual components of X+V documents is accomplished through a combination of ECMAScript, JavaScript, and XML Events.
Voice input
Voice input or speech recognition is based on grammars that define the set of possible input text. In contrast to a probabilistic approach employed by popular software packages such as Dragon Naturally Speaking, the grammar based approach provides the recognizer with important contextual information that significantly boosts recognition accuracy. The specific formats for grammars include JSGF.
Voice output
Voice output or speech synthesis can read any string at virtually any time. Pitch, volume, and other characteristics can be customized using CSS and Speech Synthesis Markup Language (SSML) however the Opera web browser doesn't currently support all these features.
MIME types
The previously recommended MIME type for any X+V document is application/xhtml+voice+xml which is what the Opera browser uses. Opera will also interpret X+V documents served as text/xml. The current recommended MIME type for any X+V document is application/xv+xml. Since most web servers associate the .xml extension with text/xml, an xml extension is a fairly safe way of making your static X+V document files browsable.
X+V-enabled browsers
The most commonly used X+V browser is the Opera browser. Users of the Opera browser can enable X+V support through steps described at http://www.opera.com/voice/. Voice is not yet supported in Opera Mini or on platforms other than Windows.
Detecting support for X+V is best done from the server by checking the HTTP header "Accept" for the MIME type application/xhtml+voice+xml. Here is some PHP code that returns "true" if and only if the requesting browser supports XHTML+Voice:
<?php /* The following script echoes "true" if and only if the requesting browser supports XHTML+Voice. */ // // Determine whether browser is sending Accept header. // if (isset($_SERVER['HTTP_ACCEPT'])) { $accept = $_SERVER['HTTP_ACCEPT']; // If they omit the MIME type from Accept then assume no support. if (strpos($accept, 'application/xhtml+voice+xml') === false) { echo 'false'; } else { echo 'true'; } } else { echo 'false'; } ?>
Related Technology
Speech Application Language Tags(SALT) is a very similar format developed by Microsoft in 2001 to compete with VoiceXML and XHTML+Voice. SALT also provides users with multimodal support including grammar based recognition and speech synthesized output. The main differences are in the providers of support. Many different companies support VoiceXML and XHTML+Voice by providing various development tools and in particular IBM and Opera Software. SALT is supported almost exclusively from Microsoft by products such as the Microsoft Speech Application SDK and Microsoft Speech Server.