VoiceXML

From Wikipedia, the free encyclopedia

VoiceXML (VXML) is a digital document standard for specifying interactive media and voice dialogs between humans and computers. It used for developing audio and voice response applications, such as banking systems and automated customer service portals, that are developed and deployed in an analogous manner to the interaction between web browsers, which render Hypertext Markup Language (HTML) in visual applications, and the servers that deliver them. VoiceXML documents are interpreted by a voice browser. In common deployment architectures, banks of voice browsers are attached to the public switched telephone network (PSTN) which interact with users via the telephone network.

The VoiceXML document format is based on Extensible Markup Language (XML). It is a standard developed by the World Wide Web Consortium (W3C).

Usage

VoiceXML applications are commonly used in many industries and segments of commerce. These applications include order inquiry, package tracking, driving directions, emergency notification, wake-up, flight tracking, voice access to email, customer relationship management, prescription refilling, audio news magazines, voice dialing, real-estate information and national directory assistance applications.

VoiceXML has tags that instruct the voice browser to provide speech synthesis, automatic speech recognition, dialog management, and audio playback. The following is an example of a VoiceXML document:

<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml">
  <form>
    <block>
      <prompt>
        Hello world!
      </prompt>
    </block>
  </form>
</vxml>

When interpreted by a VoiceXML interpreter this will output "Hello world" with synthesized speech.

Typically, HTTP is used as the transport protocol for fetching VoiceXML pages. Some applications may use static VoiceXML pages, while others rely on dynamic VoiceXML page generation using an application server like Tomcat, Weblogic, IIS, or WebSphere.

Historically, VoiceXML platform vendors have implemented the standard in different ways, and added proprietary features. But the VoiceXML 2.0 standard, adopted as a W3C Recommendation on 16 March 2004, clarified most areas of difference. The VoiceXML Forum, an industry group promoting the use of the standard, provides a conformance testing process that certifies vendors' implementations as conformant.

History

AT&T Corporation, IBM, Lucent, and Motorola formed the VoiceXML Forum in March 1999, in order to develop a standard markup language for specifying voice dialogs. By September 1999 the Forum released VoiceXML 0.9 for member comment, and in March 2000 they published VoiceXML 1.0. Soon afterwards, the Forum turned over the control of the standard to the W3C.^[1] The W3C produced several intermediate versions of VoiceXML 2.0, which reached the final "Recommendation" stage in March 2004.^[2]

VoiceXML 2.1 added a relatively small set of additional features to VoiceXML 2.0, based on feedback from implementations of the 2.0 standard. It is backward compatible with VoiceXML 2.0 and reached W3C Recommendation status in June 2007.^[3]

Future versions of the standard

VoiceXML 3.0 will be the next major release of VoiceXML, with new major features. It includes a new XML statechart description language called SCXML.

Related standards

The W3C's Speech Interface Framework also defines these other standards closely associated with VoiceXML.

SRGS and SISR

The Speech Recognition Grammar Specification (SRGS) is used to tell the speech recognizer what sentence patterns it should expect to hear: these patterns are called grammars. Once the speech recognizer determines the most likely sentence it heard, it needs to extract the semantic meaning from that sentence and return it to the VoiceXML interpreter. This semantic interpretation is specified via the Semantic Interpretation for Speech Recognition (SISR) standard. SISR is used inside SRGS to specify the semantic results associated with the grammars, i.e., the set of ECMAScript assignments that create the semantic structure returned by the speech recognizer.

SSML

The Speech Synthesis Markup Language (SSML) is used to decorate textual prompts with information on how best to render them in synthetic speech, for example which speech synthesizer voice to use or when to speak louder or softer.

PLS

The Pronunciation Lexicon Specification (PLS) is used to define how words are pronounced. The generated pronunciation information is meant to be used by both speech recognizers and speech synthesizers in voice browsing applications.

CCXML

The Call Control eXtensible Markup Language (CCXML) is a complementary W3C standard. A CCXML interpreter is used on some VoiceXML platforms to handle the initial call setup between the caller and the voice browser, and to provide telephony services like call transfer and disconnect to the voice browser. CCXML can also be used in non-VoiceXML contexts.

MSML, MSCML, MediaCTRL

In media server applications, it is often necessary for several call legs to interact with each other, for example in a multi-party conference. Some deficiencies were identified in VoiceXML for this application and so companies designed specific scripting languages to deal with this environment. The Media Server Markup Language (MSML) was Convedia's solution, and Media Server Control Markup Language (MSCML) was Snowshore's solution. Snowshore is now owned by Dialogic and Convedia is now owned by Radisys. These languages also contain 'hooks' so that external scripts (like VoiceXML) can run on call legs where IVR functionality is required.

There is an IETF working group called mediactrl ("media control") that is working on a successor for these scripting systems, which it is hoped will progress to an open and widely adopted standard.^[4]

References

↑ VoiceXML Forum Tutorial on VoiceXML 2003
↑ W3C Recommends VoiceXML 2.0 InfoWorld, Ephraim Schwartz, March 17, 2004
↑ http://www.w3.org/TR/voicexml21 Voice Extensible Markup Language (VoiceXML) 2.1
↑ mediactrl charter: Burger, Dawkins

External links

Listen to this article (info/dl)

This audio file was created from a revision of the "VoiceXML" article dated 2011-10-29, and does not reflect subsequent edits to the article. (Audio help)

More spoken articles

W3C's Voice Browser Working Group, Official VoiceXML Standards
VoiceXML Forum, VoiceXML Trademark Holder
DMOZ Open Directory Listing - VoiceXML
VoiceXML tutorials

World Wide Web Consortium

Products and
standards

Recommendations	Canonical XML CDF CSS DOM Geolocation API HTML ITS MathML OWL P3P PLS RDF RDF Schema SISR SKOS SMIL SOAP SRGS SSML SVG SPARQL Timed Text VoiceXML Web Storage WSDL XForms XHTML XHTML+RDFa XInclude XLink XML XML Base XML Encryption XML Events XML Information Set XML namespace XML Schema XML Signature XOP XPath 1.0, 2.0 XPointer XProc XQuery XSL XSL-FO XSLT (elements)

Notes	XAdES XHTML+SMIL XUP

Working drafts	CCXML CURIE HTML5 InkML JSON-LD RIF SCXML SMIL Timesheets sXBL WICD XFDL XFrames XBL XMLHttpRequest

Guidelines	Web Content Accessibility Guidelines

Initiative	Multimodal Interaction Activity Markup Validation Service Web Accessibility Initiative WebPlatform

Deprecated	C-HTML HDML JSSS PGML VML XHTML+MathML+SVG

Organizations

Software

Conference-related

IW3C2
World Wide Web Conference
WWW1

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.