List of speech recognition software

From Wikipedia, the free encyclopedia

This article or section has multiple issues. Please help improve the article or discuss these issues on the talk page.

It needs additional references or sources for verification. Tagged since February 2008.
It may require general cleanup to meet Wikipedia's quality standards.Tagged since February 2008.

This list is incomplete; you can help by expanding it.

Modern Speech recognition software enables a single computer user to speak text and/or commands to the computer, largely, but not entirely, bypassing the use of the keyboard and mouse interface.

The idea has been portrayed in science fiction for many decades, quite frequently depicting computers that do not even have keyboards or mice. Such computers are also typically depicted as being able to keep up no matter how fast a person speaks, and without regard to who the speaker is, the language spoken, or even how many speakers there are. In other words, they're depicting a computer that hears in like manner as a multilingual person.

Attempts to develop usable speech recognition software began in the mid-1900s, and proved to be far more daunting than anyone had imagined. It also turned out to require so much computing power that only the most modern computers are now able to perform the functions required in real time (i.e., as fast as you can speak).

The first commercially practical products became available around 1990, (e.g. the Voice Navigator, a standalone computer dedicated 100% to speech recognition) and used up all the available computing power of the machine, which would send its output to a second computer. They weren't particularly accurate and could only understand a single person at a time, requiring retraining, not of the operator but of the machine itself, to work for another person. Despite these limitations, they could type so rapidly that even after taking time to make corrections, a person with disabilities could easily accomplish more work with the machine than without it. For persons with physical disabilities, the ability to simply talk to your computer could be a priceless asset. Consider for instance, an author with Parkinson's disease who can barely control his hands, yet is conveniently able to create an article.

There are other scenarios where the deficiencies of the equipment are easily outweighed.

Consider a facility where corrosive materials, or high-voltage equipment, are being handled... The massive gloves required for that type of work typically preclude using a keyboard.

Most modern telephones now include voice dialing -- with the simplified requirements associated with voice dialing, it is easily accomplished without training the computer for a specific user.

The current state-of-the-art in 2008 is that a properly trained computer, operated by a normal healthy adult (i.e. no speech impediments), with an Intel Core Duo 1.5 GHz CPU (or faster), can achieve approximately 99% accuracy while transcribing up to about 150 words per minute (while using most of the computing power available). Superficially this might sound very good. Note however, a very stable voice is required. A successful operator, upon developing a nasty head cold, may suddenly find that his machine does not understand him at all. And yet most humans have no trouble at all understanding even in that difficult situation.

Consider for example, the machines do not have enough intelligence yet to properly process a child's voice. Obstacles include the fact that most children don't yet fully understand how language is used (e.g. proper construction of a complete sentence) and as they are growing their voices are continuously changing. (How many times have you had to ask the parents of the youngster what the child said?)

There are now both proprietary and open source systems on the market, with development emphasis being placed upon serving the legal and medical markets.

[edit] Free and open source software

CMU Sphinx — open source under a BSD license
HTK — copyrighted by Microsoft, but altering the software for the Licensee's internal use is allowed.
ISIP Toolkit
Julius — BSD-style license
Simon — GPL Licence
VoxForge — open source, GPL

[edit] Proprietary software

CSLU Toolkit
Dragon NaturallySpeaking from Nuance Communications is the continuous-speech successor to the older DragonDictate product, and appears to be the focus of all their current development effort.
eScription's Medical Transcription ASR system
HandHeld Speech Voice LookUp - provides speech recognition software for small embedded systems, including Pocket PC devices.
IBM ViaVoice - as it pertains to Linux, Mac OS, and Windows was licensed to Nuance Communications (formerly ScanSoft) a few years ago. Control and development as it pertains to embedded processors remain in the hands of IBM. Functionality is similar to Dragon NaturallySpeaking. ViaVoice is available on Linux and Mac OS X (although these versions are no longer maintained). The Nuance website provides a list of which legacy systems can run the final versions. It is unclear if the Windows version will be updated beyond XP. So far, Nuance is not listing Vista as a recommended system.
MacSpeech Dictate - Mac OS X speech recognition using the Dragon NaturallySpeaking engine. This replaces MacSpeech's former iListen product which is based on Philips Speech Technology.
Microsoft Speech API - Speech recognition functionality included as part of Microsoft Office and on Tablet PCs running Microsoft Windows XP Tablet PC Edition. It may also be downloaded as part of the Speech SDK 5.1 for Windows applications, but since that is aimed at developers building speech applications, it lacks any user interface, and thus is unsuitable for end users. Windows Vista includes version 8.0 of the Microsoft speech recognition engine along with a completely new speech experience, known as Windows Speech Recognition.
Philips SpeechMagic - Market leader within the medical industry according to Frost & Sullivan, Philips SpeechMagic is a recognition engine that may be run either as a stand-alone product or integrated into other applications.^[1]^[2]
Proteus Conversational Interface
Quack.com (acquired by AOL)
SpeechWorks
teliSpeech
Tellme Networks