Microsoft Speech Server

From Wikipedia, the free encyclopedia

The Microsoft Speech Server is a product from Microsoft designed to allow the authoring and deployment of IVR applications incorporating Speech Recognition, Speech Synthesis and DTMF. The product also has some limited support for multimodal applications running on IE on Windows and PocketPC devices.

The first version of the server was released in 2004 as Microsoft Speech Server 2004 which supported applications developed for U.S. English-speaking users. A later release Microsoft Speech Server 2004 R2, released in 2005 added support for North American Spanish and Canadian French as well as additional features and fixes. In August of 2006, Microsoft announced that Speech Server 2007, originally slated to be released in May of 2007, had been merged with the Microsoft Office Live Communications Server product line[1]. In some form, Speech Server may be available as a separate product until the end of 2007.

Applications are authored using the Speech Application Language Tags {SALT} mark-up tags, which are embedded in a HTML document. Authoring tools are supplied with the server which allow the use of ASP.NET dialog controls which automatically generate HTML pages with SALT tags.

These pages are published on a standard Web server. Another server, the Telephony Server downloads the relevant HTML pages when a telephone call is being handled and processes them. This will involve communicating with a third Speech Server where the actual speech recognition and synthesis take place (internally using server versions of SAPI and Microsoft speech engines}. Input audio, recognition grammars and synthesis requests are sent to the Speech Server box, and recognition results and synthesized audio are sent back to the Telephony Server {Note these servers do not all have to be on separate physical hardware}.

A PocketPC application is also written as a web-page and is browsed to by the user on their device's browser. The SALT markup on the page is processed by a browser add-on which directly communicates with the Speech Server box to do recognition and synthesis. Compressed audio is streamed between the PocketPC and the Speech Server - the Telephony Server is not used in this scenario. Only the SALT markup on the web-page can be controlled by voice - no general speech control of the Pocket PC is incorporated.

Similarly a SALT application can be run from within IE on a standard Windows PC. The page is downloaded and the SALT markup is processed by a browser add-on, and the recognition and synthesis is done locally using built-in speech engines. The Speech Server and Telephony Server are not used in this scenario.


[edit] See also

[edit] External links