Talk:Speech Application Programming Interface

From Wikipedia, the free encyclopedia

This page is starting to feel complete. Some sections on e.g. SAPI versions 1 through 4 are not done yet, and it's had only minimal proof-reading Dave w74 09:43, 9 February 2006 (UTC)

Okay this has most of the content I think is necessary and I believe this is all technically accurate. I've also done some basic proof-reading and clean-up. Dave w74 10:31, 10 February 2006 (UTC)

[edit] SAPI isn't Microsoft exclusive

I had a problem with it when SAPI only mentionned the Apache defintion, but didn't bother to comment on it because of the footnote about microsoft's SAPI as an alternate meaning. I would think that accepting the term is ambiguous and having a second page entry for the topic would be better, but frankly, I like this less. It ONLY discusses Microsoft's SAPI and in more detail than is likely necessary. The problem is that Microsoft SAPI isn't the only SAPI engine available. There is a whole family of them including the IBM one used in ViaVoice and they are all called SAPIs, that being the general industry term for the engine type, just as TAPI generally refers to Telephony engines, regardless of maker. To focus on the Microsoft SAPI as being exclusive doesn't feel like a 'general encyclopedia' entry to me but rather an advertisement for Microsoft, just as the last version of the other entry for SAPI looked like an Advertisement for Apache. If SAPI needs a more detailed explanation than just 'what is a SAPI', it should be in picking it apart into it's layers and not focusing on any specific versions of a SAPI.

I'm not a SAPI coder, but I've been researching them in an effort to hobble together a very specialised SAPI I'd rather not share details on at this time so have a pretty intimate knowledge of how they work and frankly, the Microsoft version of a SAPI isn't my prefered of the existing engines out there and isn't portable between different OSes, like the ViaVoice SAPI is.

Splitting a SAPI into levels can be done two ways. The General Levels are high and low, in the same fashion as all coding lingo. High level SAPI use is using the SAPI for all it can do and trusting the engine and your XML database to do all the actual work for you. Low level SAPI use is getting into the guts of the engine and doing most of the actual work (defining the voice, laying out the allophony tables, etc.) yourself, in your code, but just need the most basic abilities in the engine (custom speech engine usually).

Splitting the SAPI into technical levels results in 4 common levels of a SAPI and they are the order in which the SAPI processes sound for speech recognition. Creating speech is much simpler and doesn't use all 4 levels.

In the typical method of SAPI operation I am familiar with, the levels that process sound are:

Level 1: Determining the timing of the speaker... The process of trying to figure out when one word ends and the next begins and start to at least make some sense of what is being said.

Level 2: Process the word into a usable phonetic code that can be cross checked with the data base of words.

Level 3: Determine the word(s) being said.

Level 4: In the case of multiple possible words with that pronunciation, attempt to determine which word it might be in the context of surrounding words.

This is a very crude anatomy of a SAPI and some do parts of it better than others, but really, this does all come down to my not appriciating seeing advertisements for specific products in an encyclopedia... I'd have to double check my SAPI history, but I'm not even sure it was Microsoft that can get credit for making the first SAPI... They just happenned to do the first freely usable SAPI exclusive to their (most popular)OS.

Interesting comments. I agree with several things you say: Dave w74 21:03, 21 March 2006 (UTC)
  • Yes I agree this page could be named better to make it clear we are talking about a Microsoft API. I propose moving this page to "Microsoft Speech Application Programming Interface". The stuff I wrote on this page was absolutely not intended to imply that this is the only Speech API in the world, or that it was the first, or that it was the best.
  • Then in the SAPI disambiguate page one could add references to pages about other Speech APIs {if they exist}. However, to be clear, I think almost always the abbreviation "SAPI" refers to the Microsoft API. Other APIs tend to have slightly different abbreviations such as JSAPI for Java Speech API and SRAPI for Speech Recognition API. But I think it would be reasonable if it helps avoid confusion.
  • I think you're proposing a page about Speech APIs in general. I'm not sure I think this is necessary - I don't think there's "Telephony API" page, or a "Mail API" page so why add one for this? I also think it would be hard to discuss Speech APIs in general - there are so many kinds {recognition, synthesis, desktop, telephony, speaker verification etc.} But if you think it adds value, go for it ... Dave w74 21:03, 21 March 2006 (UTC)
Just switching it to MS SAPI or Microsoft SAPI would be good enough, but SAPI alone is a generic term used by multiple products.. Even the most common speech engine used in automated, voice recognising phone systems (more and more common with customer support these days) calls their product a SAPI and it does both ends of the job. I am pretty certian (but could be wrong) that the ViaVoice engine can do both, though the actual application they publish for speech recognition only does one. There are also SAPI's in other OSes. I will grant that YES, when most people say SAPI, they mean Microsoft SAPI, but when most people say computer they mean a Winbox too, but I don't see putting an entry up on computers that focuses on Windows. Though I use Windows a lot, I am actually partial to LINUX or Solaris, myself. As far as a general SAPI entry for Speech Application Programming Interface, I don't feel quite qualified to give it the history it would deserve. As Far as I am aware, the term SAPI for Speech Application Programming Interface was actually first coined in the mid-80s to describe the program end of a hardware device that could let older 8-bit computers use a VOX chip to talk. Can't find any older history on it, but there may be. - Original commenter - 12:53, 22 March 2006 (UTC)
If you can locate a "Speech Application Programming Interface" that goes by that very specific name in common usage, that isn't the API discussed in this article, please feel free to link it here in the talk section, and we'll sort it out. The proper capitalisation of the name is no accident; it really is the only API out there in the world that goes by that exact name. Renaming this article to "Microsoft SAPI" is a very bad idea, as it reduces the readability of the article's title -- the fact that it's a speech API is far more important than the fact that Microsoft produced it. The only circumstance by which this article should change names is if there is a specific disambiguation need. Right now, there simply isn't. Warrens 14:20, 22 March 2006 (UTC)

[edit] 101 Things to do with Microsoft Sam

For a list of fun things to do with Microsft Sam, see User:Martinultima/101 Things To Do With Microsoft SamSpongeSebastian 04:58, 17 August 2006 (UTC)

[edit] Why merge with Microsoft Sam???

I don't understand why Microsoft Sam was deleted and now redirects here. I thought the Microsoft Sam page was absolutely fine. Now we have weird sections on this page with people pointing out that certain words sound funny, which really doesn't seem to fit well with the rest of the content.

I agree that getting a perfect arrangement of pages related to Microsoft Speech is a bit tricky, but this move seems counter-productive to me. Dave w74 02:24, 17 October 2006 (UTC)

I also removed the Easter Egg section. This wasn't a deliberate joke by Microsoft, it's just a bug or limitation in the TTS voice, and as such I don't think it's notable. No TTS engine pronounces every word or phrase perfectly - it's just a fact of the technology. User:Martinultima/101 Things To Do With Microsoft Sam is a great place for this kind of fun stuff.