VoiceXML

From Wikipedia, the free encyclopedia

VoiceXML (VXML) is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer. It allows voice applications to be developed and deployed in an analogous way to HTML for visual applications. Just as HTML documents are interpreted by a visual web browser, VoiceXML documents are interpreted by a voice browser. A common architecture is to deploy banks of voice browsers attached to the public switched telephone network (PSTN) so that users can use a telephone to interact with voice applications.

Many commercial VoiceXML applications have been deployed, processing many millions of telephone calls per day. These applications include: order inquiry, package tracking, driving directions, emergency notification, wake-up, flight tracking, voice access to email, customer relationship management, prescription refilling, audio newsmagazines, voice dialing, real-estate information and national directory assistance applications.

VoiceXML has tags that instruct the voice browser to provide speech synthesis, automatic speech recognition, dialog management, and soundfile playback. The following is an example of a VoiceXML document:

<?xml version="1.0"?>
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml">
  <form>
    <block>
      <prompt>
        Hello world!
      </prompt>
    </block>
  </form>
</vxml>

When interpreted by a VoiceXML interpreter this will output "Hello world" with synthesized speech.

Typically, HTTP is used as the transport protocol for fetching VoiceXML pages. Some applications may use static VoiceXML pages, while others rely on dynamic VoiceXML page generation using an application server like Tomcat, Weblogic, IIS, or WebSphere. In a well-architected web application, the voice interface and the visual interface share the same back-end business logic.

Historically, VoiceXML platform vendors have implemented the standard in different ways, and added proprietary features. But the VoiceXML 2.0 standard, adopted as a W3C Recommendation 16 March 2004, clarified most areas of difference. The VoiceXML Forum, an industry group promoting the use of the standard, provides a conformance testing process that certifies vendors implementations as conformant.

Two closely related W3C standards used with VoiceXML are the Speech Synthesis Markup Language (SSML) and the Speech Recognition Grammar Specification (SRGS). SSML is used to decorate textual prompts with information on how best to render them in synthetic speech, for example which speech synthesizer voice to use, and when to speak louder. SRGS is used to tell the speech recognizer what sentence patterns it should expect to hear.

The Call Control eXtensible Markup Language (CCXML) is a complementary W3C standard. A CCXML interpreter is used on some VoiceXML platforms to handle the initial call setup between the caller and the voice browser, and to provide telephony services like call transfer and disconnect for the voice browser. CCXML can also be used in non-VoiceXML contexts.

1 Future Versions of the standard
2 See also
3 Official Websites
4 External link

[edit] Future Versions of the standard

VoiceXML 2.1 is currently a candidate recommendation with only small a set of additional features, designed to be fully backward compatible with the actual 2.0 version.
VoiceXML 3.0 will be the next major release of VoiceXML, with new major features. It will use a new XML statechart description language called SCXML (currently in draft).