NaturallySpeaking
From Wikipedia, the free encyclopedia
NaturallySpeaking | |
A sample dictation in DragonPad, the included text editor. |
|
Developer: | Nuance Communications |
---|---|
Latest release: | 9.0 / 2 July 2006 |
OS: | Microsoft Windows |
Use: | Voice Recognition |
License: | Proprietary |
Website: | Nuance Communications Website |
- For the purpose of brevity just the name NaturallySpeaking is used throughout the majority of the article.
Dragon NaturallySpeaking is the speech recognition software package produced by Nuance Communications for Windows PCs.
NaturallySpeaking superimposes on top of other software. Dictation temporarily appears in a floating Results Box as words are spoken, and when a pause for breath is taken, the program will essentially transcribe or paste the words into the location of the cursor.
Like other speech recognition software, NaturallySpeaking has three primary areas of functionality. Dictation, whereby spoken language is transcribed to written text; commands that control, whereby spoken language is recognized as a command to click widgets (controls); and finally text-to-speech whereby written text is converted to synthesized audio stream. It has to be trained for approximately 10 minutes to recognize the user's voice. It claims that using naturally speaking writing a 900 word essay would take 6 minutes, while typing 40 words per minute and writing a 900 word essay would take 22 minutes.
Contents |
[edit] Common user profiles
- Health-care industry - This is likely to be the most profitable sector for speech recognition vendors. The high cost of labor, the specialized multisyllabic vocabulary of medicine, the formalized input, and the need to access a computer while using hands for other tasks makes speech recognition a compelling tool for health-care.
- Legal industry - similar to health care
- Accessibility - speech recognition is the most effective means of using a computer for those with limited or no ability to use their hands. Many people start using speech recognition after suffering the symptoms of RSI, although voice strain and RSI of the vocal cords is a possible side-effect.
[edit] Accuracy
Initially, accuracy rates of 80-85% are reasonable to be expected.[citation needed] An expert NaturallySpeaking user can expect 98-99% recognition accuracy according to Nuance Communications, but such claims of almost perfect accuracy have never been substantiated independently. Moreover, the program itself very carefully avoids reporting on recognition rates. (DragonDictate provided recognition statistics.) The 98-99% figures are unlikely to be true; speech recognition for transcription works far better when applied to broadcast news (read by journalists chosen for their diction) than when applied to speech produced by ordinary people in casual circumstances. Anecdotal evidence points to accuracy about 95% for most users.[citation needed]
Highest accuracy is achieved with, in approximate order of effectiveness:
- A quality input signal.
- A powerful computer system.
- Adding phrases to NaturallySpeaking's vocabulary with the Vocabulary Editor.
- Using the Acoustic Optimizer.
- Correcting NaturallySpeaking's misinterpretations.
- Feeding NaturallySpeaking many proofread documents.
- Training.
Any noise in the path from the larynx to the sound chip can reduce the quality of signal. Causes of reducing signal quality include poor quality microphones, too much ambient noise around the speaker, excessive noise inside the case. Integrated sound cards included in laptops and many Dell, Compaq, and Hewlett-Packard desktops, do not have dedicated shielding. Noise canceling is often considered preferable and many inexpensive microphones offer excellent performance.
Speech recognition is a processing intensive task. Speech will be recognized on Nuance's system requirements, but can be more effective with stronger equipment. In general, interpreting speech will be slower and less accurate. Some tasks that take seconds on strong systems can take minutes on weaker systems, such as saving user files and opening the Command Browser. The requirements for memory, processor, and free hard drive space are in practice regularly all quadrupled. However properly configuring the computer may be more important.
NaturallySpeaking learns with corrections. Correcting misinterpretations by including adjacent words (context) helps distinguish similar sounds.
[edit] Versions and editions
Version | Release date | Editions |
---|---|---|
1.0 | June 1997 | |
2.0 | November 1997 | |
3.0 | October 1998 | |
4.0 | September 1999 | |
5.0 | July 2001 | |
5.5 | ||
6.0 | November 15, 2001 | |
7.0 | March 2003 | Essentials, Standard, Preferred, Professional, Legal, Medical |
8.0 | November 2004 | Standard, Preferred, Professional, Legal, Medical |
9.0 | July 2006 | Standard, Preferred, Professional, Legal, Medical |
The Professional edition, and the related Legal and Medical editions which come with specialized vocabularies, allow the user to create commands. Commands are also called macros, programming instructions for repetitive tasks. The Preferred version, which as of 2006 costs roughly a fifth of the Professional version, allows only macros with the single action of pasting some text or graphics into a document. The cheapest edition, Standard, has no programmable features allowing only transcription.
Total command-and-control requires a lot of research and support. Even Nuance has chosen not to go down that road. Nuance provides the tools to create commands, but charges for command support. This has led to a prevalence of value-added resellers (VARs), people who develop commands to solve problems such as reducing the repetition of a series of events into a few spoken words.
NaturallySpeaking can be extended by other programs. NatLink, for instance, is a tool that allows NaturallySpeaking to interact with the Python programming language.
[edit] Ownership history
NaturallySpeaking has passed through many hands and evolved considerably since its first beginnings in the early 1980s as a research prototype called DRAGON. Departing from the conventional wisdom in AI, Dr. James Baker was a pioneer of Hidden Markov models, a statistical method, for the automatic recognition of speech. Dr. Janet Baker, his wife, had developed an expert system named HEARSAY. After funding was cut by ARPA, the Bakers decided to commercialize DRAGON and they founded Dragon Systems in 1982. Their first product DragonDictate was sold for a number of years. In the early 90s, the program was sold to consumers; a single-user license was available for $5000, but the price dropped to a fraction thereof over a few years. Based on a trigram model, DragonDictate was relying on hardware that was not yet powerful enough to address the difficult problem of word segmentation, the determination of word boundaries in the continuous signal that constitute human voice. Thus with this discrete speech recognition engine, users had to pronounce one word at a time, each clearly separated by a small pause before the next. In 1997 advances in hardware technology allowed continuous speech recognition in real time, and NaturallySpeaking was launched as the first available continuous dictation system.
Along with competitors in the speech recognition industry, the founders enthusiastically promoted the notion that speech input was the natural modality that would eventually supersede more "primitive" methods such as keyboards. Trying to reach a mass market, vendors dropped prices to levels that were unsustainable. The software was (and some say still is) too finicky and cumbersome to use, frustrating users with endless need for correction of recognition errors. The dictation system bubble burst in 2001, when ScanSoft Inc. bought the rights for Dragon products as part of the spectacular bankruptcy of Lernout & Hauspie, who had bought out a faltering Dragon Systems in 2000.
ScanSoft bought Nuance Communications in 2005, and changed the name of the newly combined entity to Nuance. This shows a particular drive of the company to move further into the Enterprise speech arena.
[edit] Features missing since DragonDictate
Later versions of NaturallySpeaking include a feature to ignore some types of external noise. This is the Nothing But Speech technology originally ported over from the L&H product Voice Xpress. While individual noises can't be trained as with DragonDictate there is suppression using NBS running in the background with NaturallySpeaking 8.
It is impossible to have several language versions of Dragon NaturallySpeaking installed on one system (for example: German and French). However, all non-English versions of DNS also contain the functionality to dictate in English. This problem has been rectified in DNS 8 Preferred, ALL languages can coexist and function fully on a single installation.
[edit] Alternatives
[edit] ViaVoice and iListen
The main stand alone competitor to NaturallySpeaking is IBM ViaVoice, which was licensed to Nuance (formerly ScanSoft) a few years ago. Control and development remain in the hands of IBM. Functionality is similar to NaturallySpeaking. Unlike NaturallySpeaking, it is available on Linux and Mac OS X, but these versions are no longer maintained. iListen is the leading OS X speech recognition program, but it is generally regarded as inferior to NaturallySpeaking.
[edit] Microsoft Speech API (SAPI) in Office, Tablet PCs, and Windows Vista
Speech recognition functionality built on Microsoft's Speech API (SAPI) 5.1 is included free in Microsoft Office and on all Tablet PCs running Microsoft Windows XP Tablet PC Edition. It may also be downloaded as part of the Speech SDK 5.1 for Windows® applications; but since that is aimed at developers building speech applications, it lacks any user interface, and thus is unsuitable for end users.
Windows Vista will include version 7 of the Microsoft speech recognition engine along with an improved and expanded speech-recognition interface.
[edit] Reference
- Newquist, H. P. The Brain Makers, 1994, Sams Publishing, ISBN 0-672-30412-0
[edit] See also
[edit] External links
- Nuance
- All about Dragon NaturallySpeaking speechwiki.org
- http://www.voicerecognition.com/voice_article.html
- Dragon on Linux http://appdb.winehq.org/appview.php?versionId=3227
[edit] Forums
- KnowBrainercommand suite and NaturallySpeaking technical support
- SpeechTechnology.comGold VAR Dragon Support
- VoiceRecognition.comForum
- SpeechComputing Forums and blogs about voice recognition