Wikipedia talk:Pronunciation (simple guide to markup, American)

From Wikipedia, the free encyclopedia

Please leave all commentary and suggestions in the space below the introduction, in Further discussion.

[edit] From VfD

See Wikipedia:Votes for deletion/Pronunciation (simple guide to markup, American).

Note: The pages were moved via a copy and paste move. I've merged, however to see the old talk page see Wikipedia talk:Pronunciation (simple guide to markup, American)/oldtalk. - Ta bu shi da yu 15:11, 6 Nov 2004 (UTC)

[edit] Version 0.90 Introduction

This issue has been raised on the policy thinktank list.

[edit] The need

In my view, Wikipedia needs a simple guide to pronunciation. This need came to my attention when I began the List of heteronyms. Heteronyms are words that are spelled the same but mean different things when pronounced differently. Examples from that list:

Heteronym	Form 1	Form 2
abstract	AABstraakt- not concrete	aabSTRAAKT- to generalize
bass	BAYS- deep musical tones	BAAS- a fish
bow	BOH- an ornate knot or a weapon used to fire arrows	BAHW- to pay one's respects by bending at the waist
conflict	kuhnFLIHKT- to clash	KAHNflihkt- a clash
conscript	KAHNskrihpt- one forced into a task	kuhnSKRIHPT- to force into a task

Pronunciation was indicated with Wikipedia:Pronunciation (simple guide to markup, American) (called Simple Guide herein). The need for some kind of pronunciation guide is obvious in lists like this. In addition, however, without a Simple Guide some are tempted to use ad hoc devices, as in the article:

Ayn Rand (Ayn rhymes with "mine"), Alissa (Alice) Zinovievna Rosenbaum...

With a Simple Guide, that could be written:

Ayn Rand (AHYN RAAND), Alissa (Alice) Zinovievna Rosenbaum...

Just as important, using a redirect the author of this line could have easily linked the pronunciation markup to the Simple Guide, something like this:

'''Ayn Rand''' ([[WSPGA | AHYN RAAND]]), Alissa (Alice) Zinovievna Rosenbaum...

Within seconds the Simple Guide loads, and even an unfamliar user can quickly ascertain the pronunciation. The posited alternatives, SAMPA and IPA, have large pages and are extremely cumbersome to use.

[edit] The alternatives

X-SAMPA and IPA have their virtues. They are comprehensive, standard and international. But using them for indicating all pronunciation is overkill, the linguistic equivalent of using a 4GHz computer instead of a pencil to print Post-it notes.

One contributor marked the above-mentioned heteronym list using IPA. Examples:

Heteronym	Form 1	Form 2
abstract	/ˈæb.stɹækt/- not concrete	/æb.ˈstɹækt/- to generalize
aggregate	/ˈæ.ɡɹə.ɡeɪt/- to amass	/ˈæ.ɡɹə.ɡət/- composite
discard	/dɪs.ˈkɑɹd/- to dispose of something	/ˈdɪs.kɑɹd/- an item disposed of
incense	/ˈɪn.sɛns/- burned aromatic	/ɪnˈ.sɛns/- to make angry

Assuming your computer can read the codes and display properly, you can compare for ease of use through the eyes of the average reader.

Just as important, when writing an article just entering the IPA codes codes is much more work:

Instead of AAGruhgayt we must enter

{{IPA |/ˈæ.ɡɹə.ɡeɪt/}}

Instead of AAGruhguht we must enter

{{IPA |/ˈæ.ɡɹə.ɡət/}}

[edit] Design criteria

The Simple Guide I propose has drawn from many other schemes, and is intended to be:

Simple to learn
Simple to use for writers—no special characters
Simple to use for readers
Mostly intuitive without sacrificing simplicity
Adequately comprehensive without sacrificing simplicity

Simplicity is obviously the principal criterion. To achieve simplicity, some accuracy and freedom from ambiguity must be traded off. Nevertheless, I estimate that 999 words out of 1000 ordinary English words can be very closely approximated using the Simple Guide.

[edit] Response to objections

Con: This is not an encyclopedia article.
Pro: Agreed. A Simple Guide--this or another-- would be similar to Policy pages, meta-content.

Con: We don't need Yet Another Pronunciation Scheme. We already have IPA.
Pro: Wikipedia needs a Simple Guide. IPA is complex. It has its place, but so does a Simple Guide.

Con: IPA is international. A Simple Guide would not be. This is only useful for American pronunciation.
Pro: The Simple Guide can be used as a foundation to construct – in a very short time – a British guide, an Australian guide, etc. (Even better, it is probably possible, by carefully selecting sample words, to integrate the British, American and other English dialect versions. The early draft is "American" because that's what I know.)

Con: The Simple Guide is non-standard.
Pro: If you know of a "standard" which better meets this need, and is also available for Wikipedia use, by all means suggest it. Otherwise, let's fill a demonstrable need with a decent solution and be done with it.

Con: A Simple Guide would be biased toward particular pronunciations.
Pro: If choosing the predominantly-used pronunciation(s) is "bias", then yes, bias is essential. Very few reference works list every extant pronunciation by every English dialect. In pronouncing past, for example, Merriam-Webster ignores both an eastern and southern dialect of American English. The preponderant pronunciation is the exactly right choice for some articles, comparative dialect studies the right choice for others. Furthermore, many lesser-used dialects can be easily represented with a Simple Guide. Those less susceptible, like Cajun, might require IPA to achieve more accuracy where necessary.

Con: Your Simple Guide is not really intuitive. I didn't know what to make of the aa and uu symbols.
Pro: It's more accurate to say that for some users a couple of symbols are somewhat less intuitive than the others. That's unavoidable, given the nature of expressing the complex with just a few symbols. But the most obvious alternatives introduced ambiguity and problems of their own. Besides, most other schemes use symbols like ä and &, which are even less intuitive. All suggestions for improvement are most welcome, of course.

Con: How do you transcribe words like marry and merry and entrepreneur using this scheme?
Pro: Using the most common American pronunciations, according to Merriam-Webster, marry would be MAARee, merry would be MEHRee, and entrepreneur would be AHNtruhpruhNUHR. Alternate pronunciations are as easily indicated using the Simple Guide.

Con: How do you mark secondary stress?
Pro: An example is entrepreneur above. I prefer CAPS to other marks, just to keep things simple. Entrepreneur is probably pronounced ten different ways in the U.S., with heavier emphasis on the first or last syllable being among the variants. If the need to be that precise exists, use IPA; if not, use the Simple Guide. I prefer the caps scheme (AHNtruhpruhNUHR) to something like "ahn-truh-pruh-'nuhr or Merriam-Webster's "änn-tr&-p(r)&-'n&r. Simple is good.

Con: This scheme will not do some foreign words/sounds used in American (or whatever) speech.
Pro: You got me there. Bring out the nukes (IPA, SAMPA) for that.

Con: But IPA is really easy to use.
Pro: If IPA is "easy", the Simple Guide is for morons. We morons need love too, my friend.

Con: IPA makes it possible for people in other countries to learn English pronunciation.
Pro: You mean a particular English pronunciation, don't you? Fine. If that's what an article needs, that's what an article needs. Other articles, like List of heteronyms, are better served with a Simple Guide.

Con: The Simple Guide is biased toward American English.
Pro: An American Guide would be, just as a British Guide would be British. With some help from experts in dialect, however, it's probably possible to fashion a guide with example words which are common to over 90% of English speakers using many dialects.

Con: A Simple Guide would "assume a knowledge of English."
Pro: Of course. So does the page you're reading. So does the English-language Wikipedia. A tool, any tool, must assume certain knowledge to build more.

Con: Wouldn't a Simple Guide allow article writers to select a regional pronunciation?
Pro: Of course. Would this be any less true for IPA, if article writers actually USED it to indicate pronunciation? Furthermore, we use regional spellings in Wikipedia—you write humour and I write humor. We survive those differences and we would survive differences in pronunciation, actually being enriched by inclusion and cultural variety.

[edit] Conclusion

Wikipedia should adopt a simple means of indicating English pronunciation.
In my view, the advantages far outweigh the concerns.
The absense of a guide leaves a hole in many articles, and prompts nonuniform ad hoc solutions.

Any suggestions for improving the proposed guide are most welcome.--NathanHawking 02:05, 2004 Nov 3 (UTC)

[edit] Further discussion

[edit] Discussion from original page

I have taken the liberty of pasting in this discussion from the talk page of the original article within Wikipedia proper, which now seems destined for deletion. Most of it seems to have been incorporated into the FAQ above, but I think Nohat's list of words might make a useful ad-hoc touchstone. Pnot 03:57, 3 Nov 2004 (UTC)

Looks good to me, Pnot. Thanks. --NathanHawking 04:56, 2004 Nov 3 (UTC)

How do you transcribe the following words using this scheme?

her
err
air
marry
merry
Mary
butter
button
prism
sink
single
finger
forest
roses
Rosa's
for
poor
entrepreneur
lure

Also, how do you mark secondary stress? Nohat 09:35, 1 Nov 2004 (UTC)

Any SIMPLE phonetic markup system will be unable to map all the nuances of possible pronunciations of all words. Trade-offs, remember? That's mostly irrelevant, though, since pronunciations vary widely anyway, and even a dictionary with an extensive phonetic table can only approximate a selected "standard" pronunciation. Look up words on Merriam-Webster--their recorded pronunciation often fails to match their hypothetical written one.

A good example is their entry for record. [1] Their phonetic indication is ri-'kord, the equivalent of Wikipedia:Pronunciation (simple guide to markup, American)'s rihKOHRD. But play the recorded version of M-W's record pronunciation and it's much closer to reeKOHRD, which happens to be the way I pronounce the word.

I've made no provision for secondary stress, to keep things simple. A word like entrepreneur, however, could be adequately rendered AHNtruhpruhNUHR, close enough to M-W's "änn-tr&-p(r)&-'n&r. Once again, simplicity and ease of use are more important than scholarly accuracy. Once again, the word entrepreneur is pronounced in a dozen different ways across America. Close enough, for a List of heteronyms and the like, is good enough.

As for your list, I leave most of them as an exercise for you, but will do:

her HUHR or HR (M-W: 'h&r)
err EHR (M-W: 'er)
air EHR (M-W: 'er)
marry MAARee (M-W: 'mar-E)
merry MEHRee (M-W: 'mer-E)

As you know, most of these have two pronunciations, but I only selected one to illustrate. I estimate that 99.99% of "standard" American English words can be closely approximated using this table. As an exercise, I also tried it out tonight on a southern U.S. dialect and it worked well there too.--NathanHawking 10:52, 2004 Nov 1 (UTC)

[edit] Another proposal

The author of Wikipedia:English_phonetic_spelling offered, awhile back, a proposal which is not as simple as this one, but the symbol-use selected is worthy of study. --NathanHawking 02:54, 2004 Nov 3 (UTC)

After a little study of Wikipedia:English_phonetic_spelling, I notice what may illustrate one of the problems one encounters when developing a system like this, ambiguities arising from the juxtaposition of symbols. In that proposal, for example:

a is the sound in cot, i the sound in fit, yet ai is the sound of size. To pronounce size, his/her proposal would use /saiz/. But is /saiz/:

s+ai+z with the ai symbol and a long i? Or
s+a+i+z with individual a and i symbols (sounding the cot and fit vowels mashed together, closer to the way a North Carolinian would say size)?

Such problems are impossible to avoid completely; at best, one can hope to minimize them. My own proposal could benefit from similar scrutiny, to see if that minimization of ambiguity is optimal.--NathanHawking 03:20, 2004 Nov 3 (UTC)

[edit] Disagreement

For what it's worth, I strongly disagree with this approach to pronunciation. I know this form of ad-hoc pronunciation is popular in American Encyclopedias, etc, but I believe this approach is misguided. There is an internationally agreed phonetic alphabet that works in all languages (within reason) and is value-free. This system is the opposite. I truly hope this will not cause a proliferation of these "pronunciations" throughout wikipedia - especially as I have removed them whenever I've discovered them as being essentially worthless. A British version is even more error-prone and loaded than an American one, as British accents and pronunciations vary so widely. You may think the same could be said of IPA, but IPA permits an absolute pronunciation to be given to a word in ANY ACCENT. This system could conceivably do that, but only if a "standard accent" is agreed upon in the first place. There is no such thing, and so this ad-hoc system is self-referential and ultimately conveys almost no useful information at all, if you don't already know how to pronounce English (and if you do, it's redundant). I think there needs to be considerably more discussion before anything like this is deployed - I for one will be seriously turned off WP if it gains currency. Graham 05:14, 3 Nov 2004 (UTC)

Thanks, Graham, for your opinion. Let's not confuse two issues. Yes, there is a standard. But it is very cumbersome, absurdly so for many applications.

IPA ALLOWS for pronunciation of any accent, but are you seriously suggesting that a reference work list all possible pronunciations for every entry? The serious flaw in your reasoning is that given your premises and implied conclusion, no dictionary would ever list a "standard" or suggested pronunciation. Clearly that would be an absurd outcome.

It would be no less absurd on Wikipedia. In compiling a List of heteronyms, for example, words which are spelled the same and pronounced differently, it would be absurd merely to SAY they're different but not to suggest HOW they're pronounced. If a pronunciation must be given to avoid such absurdity, which one would be selected?

All pronunciations in all dialects?
None?

Those are equally absurd.

Consider the simple word "past". Merriam-Webster suggests a single pronunciation, 'past, where the a is pronounced like the one in ash. There are numerous ways Americans say the word, however, and they include, using my suggested markup:

PAAST (The equivalent of M-W's version)
PAHST (New England)
PAHyuhst (Southern U.S.)

Dictionaries have to make choices, and I believe that Wikipedia must as well, for some articles, or be crippled as a serious reference. ... --NathanHawking 08:54, 2004 Nov 3 (UTC)

Actually I'm encouraged that there is already this discussion and a structure for debate in place. I wasn't aware of it, hence my adding the comment in the other place, which I just happened to notice by accident. I appreciate you moving my comment and response here where it can serve more usefully, I hope. I agree with your points - IPA is cumbersome, and almost certainly less intuitive for the average reader, though obviously with familiarity it's just like reading anything. Now I've had a chance to think about it a bit more, I'm going to change tack a little, or maybe clarify my position anyway. I don't think the argument is about IPA versus ad-hoc; rather, it's about any scheme versus none at all. Is there actually any strong evidence that this is something that users feel is lacking? My view is that the actual words are their own pronunciation - you just need to learn them.

Whenever I have encountered an ad-hoc pronunciation inserted into an article, a couple of things strike me:

1. it has the tone of talking down to the reader, as if it is assumed they are too stupid to know how to pronounce it. For whatever reason I don't get the same feeling when I read an IPA pronunciation, but maybe that's just me. So that's a lowest common denominator sort of argument I guess.

2. - very often, the ad-hoc pronunciations lead one to an American accent pronunciation.

3. Unless you already are pretty familar with the English language, the ad-hoc pronunciations are very often simply meaningless, though of course you would have to wonder why such a person would be reading the English WP ;-) (conversely, when encountering an ad-hoc pronunciation for a foreign word, the American/English "translation" is often embarrassingly wide of the true pronunciation, a drawback that IPA doesn't have).

4. I agree with your point about how to deal with a variety of accents and versions of a pronunciation - in fact I was not suggesting that. Even with IPA one would have to establish a "standard" accent as a base. But I wonder how you're going to address this with this scheme also. In fact the problem is magnified by the fact that the English WP is also the American, Australian, and default WP - you're surely not suggesting encumbering articles with a huge list of alternatives in every opening paragraph? I'm also wondering how pervasive you intend this to be. In some US encyclopedias I've seen, they put a pronunciation on every article entry, even when the pronunciation is obvious. I wouldn't like to see WP "dumb down" to that degree, but where would you draw the line?

5.Perhaps technology can come to the rescue here - if such pronunciations are necessary at all (I don't think they are, but that's only my opinion - your option of "none" sounds OK to me!) perhaps they could be hidden for those that don't want them - maybe a user preference? Alternatively some kind of additional "helper" page or pop-up - I'm just throwing a few ideas into the ring here. In fact that approach could be quite useful - it would allow multiple pronunciations to be listed, including a mix of accents, IPA too, without cluttering an article. For those who don't need them at all, they aren't even there. Is this even technically feasible? A good idea, bad idea? Sorry for the lengthy comment, but I do feel quite strongly about this! Graham 10:19, 3 Nov 2004 (UTC)

I took the liberty of placing some paragraph breaks in your post for clarity and adding to your numbers. Responding to:

Preface: As for need, I've offered my own experience and the examples. "...the actual words are their own pronunciation - you just need to learn them" is fine for ordinary language in everyday use. But there are times when we wish to indicate the predominant pronunciation--I gave several examples.

1: I don't feel 'talked down to' when Merriam-Webster tells me the dominant pronunciation of a word. I feel informed. Sometimes I even alter the way I say a word.

2: Wikipedia is pluralistic about spelling, and to a lesser degree, punctuation. There's no reason it can't be pluralistic about pronunciation as well. If the author of an article has a need to indicate pronunciation, he or she can use a British pronunciation as easily as the spelling humour or realise. Even there, which British or U.K. pronunciation will be chosen? Complaining about an "American accent" is a slippery slope, because there are many American accents, and if "equal time" is demanded, cannot disparate British or U.K. dialects voice the same complaint? Slippery slope. Common sense dictates author choice, pluralism and predominant dialects.

3: I don't think pronunciations in dictionaries are "meaningless", nor do I think they are without value in some Wikipedia contexts.

4: I'm advocating a Simple Guide to indicating pronunciation. I have no present position on how extensively it might be used, only that sometimes it should be used. It's a tool; people will sort out how and where it's used in an organic way.

5: I'd rather not try to anticipate every way such a tool can be overused or misused. Any tool can be abused, but we still need tools. This one is no exception, in my view. Thanks. --NathanHawking 11:07, 2004 Nov 3 (UTC)

I also strongly disagree with usage of this. IPA is a standard, it's more flexible than this, and it doesn't assume knowledge of english like this does. Again, I appreciate that you're attempting to contribute, but feel that the encyclopedia would be better off without this, and don't expect to see it used anywhere. There are other sites where people can download soundbites of any word they want in the english language. Let people use that if they can't understand IPA. We shouldn't need to learn a different IPA-replacement for every encyclopedia in the world. --Improv 15:54, 3 Nov 2004 (UTC)

I find the "there are other sites where people can get ___________" line of thinking problematic. Exactly the same logic could have been used when Wikipedians wanted more tools for expressing mathematical formulae--let them go to the Wolfran site, or whatever. Faced with a use-IPA-or-nothing policy, some article writers will simply opt for nothing. That seems disabling, not enabling, disempowering, not empowering. The Simple Guide would be a tool, filling a gap, with a very short learning curve. --NathanHawking 20:49, 2004 Nov 3 (UTC)

[edit] Hiding pronunciations

Hi Nathan, thanks for reformatting my text - I was under pressure from the wife to finish it and do something "useful"! :) A couple of points - I am not saying pronunciations in dictionaries are "meaningless" at all - most dictionaries that are serious reference works use IPA, which is most definitely not meaningless. I haven't spent much time over at Wiktionary, but I think they use IPA there too.

By meaningless I was referring to ad-hoc pronunciations, since they are self-referential in my view. By this I mean that you cannot explain the way an ad-hoc pronunciation is itself pronounced without using the self-same pronunciation! You have to resort to IPA to get an "absolute value" for it. OK, you can show how certain syllables are pronounced by example, but is the example absolute? Very doubtful - you'd have to show the pronunciation of the example, and that would degenerate to another ad-hoc pronunciation and so on ad infinitum. The only way it can work is to tie it to agreed IPA sound values, (which are based solely on vocal tract configurations), so there is an absolute IPA "anchor" to the system as a whole. But if you need to do that, then why not just use IPA in the first place? Most serious dictionaries do not use these ad-hoc schemes, because they are insufficiently rigorous. I'm sorry if this casts Merriam-Webster in a poor light, but I prefer the OED any day (are you actually saying M-W uses ad-hoc, or are you using its examples as a basis? - I don't have M-W so I'm unclear about that). Perhaps some other dictionaries aimed at children or new readers might also use ad-hoc schemes - I don't think I've seen this in the UK though, it could be more of a US thing. I would tend to view the introduction of an ad-hoc scheme throughout WP as an "americanisation" of it (rightly or wrongly), and we really need more of that, right? :) However, WP is not a dictionary, as we all know, so maybe we're not comparing apples with apples.

I fully agree with your "slippery slope" argument - by the way, I realise the accent problem is just as much a problem with the US version as it is with a British one and others. However, even if you can settle on a small set of "standard" base accents for your pronunciation guides, there will always be those who will argue that they are wrong - in fact the opportunity for petty squabbles is enormous, lord knows it's bad enough! The same would apply to IPA too, which is why overall I'm in favour of not doing this at all. I would be much more accepting of a "hidden pronunciation" approach though (since I could personally ignore it!), if there is any technical basis for doing it. As mentioned, this could be in a variety of forms - it could even just be a separate link to an associated wiki page reserved solely for pronunciation, just as talk pages attach to each article ( i.e. a [[Pronunciation:<article name>]] for each page). I imagine that would be very feasible, though would require some (maybe significant) back-end changes that would have to mean a lot more people would need to get involved in this discussion. As another possibility, I saw a very neat javascript the other day that allowed you to create arbitrary pop-up text boxes for any link with further explanatory text of your choice in it - very standard nice code, supported by all the current browsers - something like that would be neat, but again there would probably be other issues that would come into it. I guess the main problem with any of these approaches is that it would need changes to the wiki engine. Making a scheme that fits the current wiki and pleases everyone will be a tough job. I would like to know your views on this. Graham 23:15, 3 Nov 2004 (UTC)

Please see my discussion above about why "knowledge of English is required" is only a pseudo-problem. Of course dictionaries use English words as examples of how to pronounce the sounds of unknown words. Despite this 'self-referentiality', millions of people look up words and learn how to pronounce them every day—all in terms of words they already know. Everything we learn is described in terms of what we already know.

Similarly, people program in high-level Basic or Pascal, etc., without knowing low-level Assembler. Insisting that what people really need from dictionaries is description of how to hold their mouth, lips, tongue, etc., is like insisting everyone know CPU opcodes or Assembler.

Hiding pronunciations, if they were truly to become as onerous for the Wikipedia community as you fear, should be no more technically burdensome than the Hide Table of Contents feature. But I see no need to solve a problem which doesn't exist yet, when adopting a Simple Guide to indicating pronunciation is a solution to a problem which presently exists.

Insisting that all conceivable problems be addressed before anything changes is a sure route to paralysis. --NathanHawking 23:46, 2004 Nov 3 (UTC)

I'm not insisting that people need to learn how to hold their tongue, etc - in reality most dictionaries explain IPA in terms of everyday simple words that most of us can agree on, just as you advocate for ad-hoc. The difference is that the ad-hoc system "floats around" in its own little world that cannot be traced back to absolute phonemic values. IPA is traceable.

If a technical solution is ultimately the way to go, it should be addressed sooner rather than later, since the work of removing all the ad-hoc pronunciations from pages to whatever the technical solution is could end up being a lot of work. It would be better to have a structure in place first, both technical and linguistic.

I don't think it implies paralysis, though personally I would actually prefer that as an outcome rather than see a half-baked scheme proliferate throughout WP.Graham 23:59, 3 Nov 2004 (UTC)

I think we have a fundamental misuderstanding. You seem to be referring to my proposal as ad hoc, when it is in fact quite the opposite. Absense of a Simple Guide encourages people to indicate phonetics in off-the-cuff ways--that's ad hoc. Constructing a phonetic markup guide based upon the preponderant pronunciations of example words as listed in dictionaries is no more ad hoc than the dictionaries themselves.

As to the technical issue, your concerns might have more substance if carefully spelled out. I suggest:

Very carefully define what you think the potential problem is.
Present a BRIEF summary under its own heading.
Include some brief and specific proposals as to solutions.

We'll go from there. --NathanHawking 03:06, 2004 Nov 4 (UTC)

When I use the term ad-hoc, I'm referring to all schemes of this type. I believe these are referred to as ad-hoc by linguists regardless of how carefully they have been formulated, because they are all unofficial schemes that suffer from the deficiencies mentioned. So I'm not using it any pejorative sense - I'm simply using the usual accepted term. However, if I've got that wrong, I'm sure some of the linguists here will put me straight.

With regards to the other points, I don't think I can add much more. There are others here far more qualified to discuss specific difficulties with the proposals - I'm sorry if this sounds like a cop-out, but as I said my interest in linguistics is purely casual - I don't feel qualified to push my argument further. If I'm on the right track then actual linguists will probably take up these points. If I'm wrong, then I'm wrong. My specific proposal as to a solution is, briefly, to drop it altogether. As others have said, this should not be taken as a lack of appreciation for your aims and effort - clearly we all have the general improvement of WP as a motivation, and that is very much appreciated. However, my view is that this particular proposal is misguided, even while I applaud the sincerity and motivation behind it.Graham 22:44, 4 Nov 2004 (UTC)

[edit] Assuming knowledge of English

It has been noted that a Simple Guide would "assume a knowledge of English."

True, but is that really a problem? The English-language Wikipedia is in English. A Simple Guide to indicating pronunciation would be based upon this premise as well—one must have a working knowledge of English to use the tool to acquire still more knowledge, in this case preponderant pronounciation. --NathanHawking 20:49, 2004 Nov 3 (UTC).

[edit] Do we need it?

I feel pronunciation guides better fit the wiktionary project. It's more of a dictionary thing and therefore doesn't really fit an encyclopedia. Besides, there's numbers of different American pronunciations of one word. Are you planning to transcribe every dialect? [[User:MacGyverMagic|Mgm|^(talk)]] 22:13, Nov 3, 2004 (UTC)

Yes, we need it. Clearly there are times when Wikipedia needs to describe pronunciation. See the examples above. To relegate this solely to Wiktionary is to leave a conceptual hole in Wikipedia.

The "many dialects" argument has also been addressed several times above. How many reference works "transcribe every dialect"? Few in the extreme. Why would we expect or demand this from Wikipedia? Is "all or none" really sensible? --NathanHawking 23:14, 2004 Nov 3 (UTC)

[edit] An example of why it doesn't work

I hope, Nathan, you'll forgive this, but I actually only just got around to reading the rest of this page :$ - shoot first and ask questions later, mea culpa. You have already obviously anticipated a number of arguments for and against. However, I stick to my guns. I'm a Brit. My own accent is said by most to be fairly neutral, even though I'm a geordie by birth, an accent which is both extremely broad, and extremely fashionable - to my annoyance I no longer speak with this accent! Anyway, I digress. You have a table of examples above, I'll repaste it here. I'll try and show you how your examples are pronounced by me.

Heteronym	Form 1	Form 2
abstract	AABstraakt- not concrete	aabSTRAAKT- to generalize
bass	BAYS- deep musical tones	BAAS- a fish
bow	BOH- an ornate knot or a weapon used to fire arrows	BAHW- to pay one's respects by bending at the waist
conflict	kuhnFLIHKT- to clash	KAHNflihkt- a clash
conscript	KAHNskrihpt- one forced into a task	kuhnSKRIHPT- to force into a task

In general, I see AA as "ar", as in the word 'are'. So AABstraakt "sounds like" ARBstrarkt. BAYS - the final 's' sounds like a 'z' to me, so this "sounds like" bays or baize, not base. BAAS - barse, rhymes with parse BOH - bor... with a very short 'o' and silent aspirant on the end kuhnFLIHKT - coonflict KAHNflihkt - carnflict KAHNskrihpt - carnscript

I realise these are partially because of UK vs. American accent differences, but some I feel are due to flaws in the system itself. But in any case, they mislead as much if not more than they enlighten. That's my point, in a nutshell.Graham 23:51, 3 Nov 2004 (UTC)

"Doesn't work" is greatly overstating the matter, I think, but I'm glad to hear your perception of the symbol selection. I'm open to change and refinement. A few notes and questions, in response to each comment:

"BAYS - the final 's' sounds like a 'z' to me, so this "sounds like" bays or baize, not base."

Any phonetic system can mislead on occasion, if the user doesn't become familiar with it. Reading BAYS as bays or base is not unlike the possibility of reading Merriam-Webster's 'bAs as either base or bass (the fish). With a linkified pronunciation which takes one to the table, people soon get onto the idea that the symbols may well be different than a literal reading, and words predisposed to literal misreadings would not be all that common anyway.

"In general, I see AA as 'ar', as in the word 'are'. So AABstraakt "sounds like" ARBstrarkt. ... BAAS - barse, rhymes with parse."

aa is a relatively rare combination in English. What common words containing aa would predispose that reading for your dialect?
What symbol, if not aa, could be used for the vowel in the American pronunciation of Spam and preclude the reading with the r?

"BOH - bor... with a very short 'o' and silent aspirant on the end."

What common words containing oh would predispose the reading with the r for your dialect?
What symbol, if not oh, could be used for the vowel in the American pronunciation of float and preclude that reading?

"KAHNflihkt - carnflict. KAHNskrihpt - carnscript."

(You do love those Rs, don't you?) What common words containing ah would predispose that reading for your dialect?
What symbol could be used for the first vowel in the American father which would NOT create the reading with the r?

"kuhnFLIHKT - coonflict."

(What? No R? Heh.) What common words would predispose that reading of uh for your dialect?
What symbol, if not uh, could be used for the vowel in the fun and preclude misreading?

I'll be most interested in your answers. At best, I might get some clues about how one might internationalize a Simple Guide. At worst, it might become clearer than regionalization in necessary. --NathanHawking 02:44, 2004 Nov 4 (UTC)

Oh yes, we quite partial to our r's (though many claim we don't know them from our elbows ;). By the way, you're obviously much better at some of the finer formatting thingys than I am, so feel free to reformat this. I may not be the best person to take up these points, since my interest in linguistcs is only casual, and I'm obviously biased against a scheme of this type. But I'll have a go - maybe someone else can pick up here too.

aa is a relatively rare combination in English. What common words containing aa would predispose that reading for your dialect?

It's uncommon in English, but very common in Dutch, one of its sister languages. Dutch words are familar enough that the Dutch 'aa' will tend to be read whenever this combination is seen, e.g.. Aardvark, Transvaal.

What common words containing oh would predispose the reading with the r for your dialect?

Actually this one was tough one to put across. The 'r' is hardly there. I read BOH more as in the British accented version of "boss", but without the s. A British reading of 'float' would produce something like FLAOWT - there's a definite dipthong in there. But I'm having difficulty here - the only way I am able to define precisely what pronunciation I'm talking about is to use IPA - /fləʊt/

What symbol could be used for the first vowel in the American father which would NOT create the reading with the r?

The only SYMBOL I am able to come up with is the IPA /aː/ or possibly /ɑː/, depending on your accent. I cannot think of an ad-hoc sequence that would work unambiguously - in fact I'd go as far as to say such a symbol doesn't exist. (Which is why IPA was invented).

"kuhnFLIHKT - coonflict."

(What? No R? Heh.) What common words would predispose that reading of uh for your dialect?

This could be another bit of interference, this time from German. "Kuhn" would be pronounced "coon" in German. Again, there are sufficient close ties between English and German and familiarity that "kuhn" would be seen this way by many (though definitely not all) British English speakers. The only unambiguous way to represent this short o (as in cot, shot, hot) is to use the IPA /ɒ/ : /kɒt, ʃɒt, hɒt/. The American accent would be something like /kɑːt, ɑːt, hɑːt/.

I think I've made my point now - I'd like to see others taking part in this, on both sides.Graham 04:29, 4 Nov 2004 (UTC)

For whatever it's worth, my intuitions coincide largely with Graham's (I'm a UKian too, though my intuitions are also influenced by other languages I've learned.) I find it very hard to see AA as anything but "aardvark", but I've often seen it used for the sound in "paw" too (just do a google for "aaland"). I think this kind of anecdotal information is pretty useless, though: everyone's intuitions differ, any system needs to be learned, and a huge empirical study would be required to determine the most common intuitions. And it would still only be intuitive for a minority. Pnot 21:19, 4 Nov 2004 (UTC)

[edit] Nohat's take

I think there are two questions here:

Should Wikipedia adopt an official or even semi-official system for marking the pronunciation of English words as a supplement or alternative to IPA?
If so, should that system be the one proposed here?

I think the answer to both questions is "no".

It is true that IPA is a somewhat abstruse system and in its goal to be suitable for the phonetic representation of any language, it has understandably become nonideal for the phonetic representation of any one language. Nevertheless, it is an international standard method for representing phonetics in a way that is not dependent upon any knowledge of any language described. As Graham has astutely noted, it is defined purely in terms of vocal tract configurations. This means the IPA can be used to describe the phonetics of any known (spoken) language. Not only that, it has a long and colorful history, dating back to the 19th century, and is used around the world by linguists as their primary tool for describing the phonetics of languages. IPA is the clear choice for representing pronunciations in any scholarly work, which is something Wikipedia strives to be.

However, there is a gap between the number of people who know about pronouncing English and the number of people who know how to use the IPA. Ideally, we would like people who know about pronouncing English to be able to contribute to even if they don't already know IPA. So we do two things: encourage those people to learn enough IPA to be able to contribute using IPA, and allow them to contribute their knowledge using whatever other method they like.

The primary benefit of the wiki system is that anyone can edit. Editors shouldn't feel like they can't contribute if they don't know IPA. They can (and do) just enter pronunciation information however they want. Then, someone who knows IPA comes along and converts it to IPA. It's a great system and it works. For example, people add entries to pages containing pronunciation, either describing the pronunciation in prose or using the system in their dictionary. Sometimes they'll leave a note on a talk page that they don't know IPA and hope someone will fix it. When this happens, I or someone else will come along and change the pronunciation to IPA, sometimes adding SAMPA for those people whose browsers don't correctly display the Unicode IPA. Voilà! Someone who doesn't know IPA contributes to Wikipedia, nobody is forced to learn anything they don't want to, and we didn't have to come up with our own system for marking pronunciation. They might even learn a little IPA in the process.

If we devise a system that is as easy-to-use as possible, we might surmise that we'd have solved this problem. The system will be so easy to use that anyone can use it. The problem is that as you make a system easier to use, it becomes less useful on two fronts. First, as Graham has pointed out, a simple, intuitive system is only intuitive for some subset of users, and as you make it simpler and more intuitive, the number of people for whom it is simple and intuitive shrinks. For the rest, it's unintuitive and has to be learned. Second, as you eliminate the transcription system's ability to encode subtleties of the pronunciation, the system becomes useful in fewer and fewer instances, and some other system that can encode those subtleties has to be used.

If you compound this with the fact that any formalized system will have to learned—it may be easier than IPA, but it still requires learning—you can come to no other conclusion that such a system is of diminishing utility. Not only that, you make Wikipedia less accessible because readers will encounter two separate systems for marking pronunciations, one of which is entirely unique to Wikipedia.

So we should stick with IPA because we avoid all these problems. Further, there are already thousands of people around the world who know IPA and can to contribute using IPA without learning anything new other than perhaps the method for entering IPA. When the English Wikipedia converts to UTF-8, we won't even have to enter IPA using numeric entities anymore—we can use the Unicode values which can be entered using a tool for writing IPA, some of which are as simple as point and click [2]. On the other hand, if we adopt some new system, nobody will already know it. Everyone will have to learn it if they want to contribute using that system. Why use a system that nobody knows when we could use a system that lots of people already know?

As for the particular system suggested here, I think it suffers from a few problems besides those already mentioned:

Every vowel sound is written using two letters. This makes transcriptions in this system unnecessarily long.
Vowels with R are not as simple as they seem. Simply using an ordinary vowel symbol and adding R is often unintuitive because the presence of R changes dramatically the pronunciation of the preceding vowel. # "uh" for schwa is confusing. So is conflating schwa and the vowel sound in the word cut, as they are two distinct sounds. (Merriam-Webster does this and I think the clarity of their pronunciations suffer because of it).
I think we need a separate symbol for the vowel sound of "her" which bears no phonetic relationship to either schwa or the vowel sound of cut. The sound of "fur" is not the sound of "fuh" plus R.
t~h for eth seems extraordinary arbitrary and there doesn't seem to be any motivation for it at all by the other symbols.
There is no syllable separator symbol. Long strings of unstressed syllables are hard to visually parse, particularly when di- and trigraphs are used to represent single sounds.
There is no a way to mark secondary stress. Secondary stress is critical to the correct and unambiguous pronunciation of English words.

Overall I think this system suffers from not being informed by in-depth knowledge about how English phonology works. I could suggest particular ways to improve these flaws but I won't because I don't think we need a system like this and I don't think any number of improvements (save making it congruent with IPA) would make it satisfactory, and if it were congruent with IPA we might as well as use IPA.

I think we should continue to use IPA and let people who don't know IPA either learn it, or cope with not knowing it by relying on those of us who do. The general approach on Wikipedia is to solve problems with a human solution first and create a technical solution only if that fails. I and others who know IPA will continue to be happy to help people put their pronunciations in IPA. Let's focus our efforts on making IPA as accessible as possible both to readers and to editors rather than divide our efforts into two competing systems.

Note about when to use pronunciations on Wikipedia: In general I have added information about pronunciation to two classes of articles. The first should be uncontroversial: articles that are about pronunciation in some importation way, like list of words of disputed pronunciation and list of English homographs. The second is to articles where I thought the pronunciation was non-obvious or otherwise interesting, like San Jose, Illinois and clitoris. I have been pretty tolerant of ad-hoc pronunciations on e.g. clitoris, and no one has complained about including IPA. In fact, I have written a patch for MediaWiki which can render IPA into multiple pronunciation schemes, including X-SAMPA, Kirshenbaum, and a modified version of the system used by Charles Harrington Elster in his Big Book of Beastly Mispronunciations. (Note while I disagree with Elster on many of his conclusions—FLAK-sid for flaccid, indeed!—his system is one of the better ad-hoc systems I have encountered.) This system would require an editor to enter only IPA or X-SAMPA and the pronunciations would be rendered on the page using multiple pronunciation schemes that can be selected in user preferences. It will probably be a while before the patch gets into the code running on the Wikipedia proper.

I don't think every article needs a pronunciation—that's what Wiktionary is for. However, I do think that there will continue to be a need for perhaps large quantities of pronunciations on articles in the first class, and I think those pronunciations should match the already existing articles and use IPA. Nohat 10:32, 4 Nov 2004 (UTC)

[edit] Pnot's take

My criticisms:

Coverage: you seem to acknowledge that your scheme is incomplete, and that this is acceptable because you estimate that "999 words out of 1000 ordinary English words" or "99.99% of "standard" American English words" can be represented. I don't know how you arrived at these (somewhat different!) estimates, but even if one of them is correct:
- The words most in need of pronunciation detail are those which are not standard or ordinary!
- If we use your scheme just for "ordinary American" words, then we still need to use another scheme for ones that can't be represented. So users will have to learn a different encoding anyway. Presumably your initiation of an encoding for British English is supposed to address this... but then you'll end up with conflicting schemes for all kinds of English dialects, and that's before we start on foreign words and names! Even if any one of these schemes counts as "simple", they will form a huge and unwieldy aggregate. You won't be able to trust a pronunciation until you've looked up which "simple" encoding it uses.
- Wikipedia is an encyclopaedia, not a dictionary. It contains many foreign words and names, which would not be representable in your system. Once again, we'll end up having to use the IPA for many things anyway, forcing people to learn multiple schemes.
- IPA solves this neatly: one sound, one symbol, one system, no matter what the language.

Learning and intuitiveness:
- At present, nobody but you knows your encoding. I estimate that tens of millions of people know the IPA, or at least those parts of it relevant to English. In my view that's a very strong argument against forcing them all to learn a non-standard scheme in addition.
- You claim that it's intuitive, but I believe this is wishful thinking: every pronunciation encoding strives to be as intuitive as possible, but everyone's intuition differs. Without some kind of vast empirical study, we can't know what most people find intuitive, so it would be foolish to base any arguments on our subjective opinions.
- Your encoding is defined in terms of words which the user must already know, so is of little use to non-native English speakers who don't know the defining words. There's no way for me to find out the pronunciation of "router" if I don't know the pronunciation of "flout", "noun" or "sound". In contrast, there are already IPA tables for virtually every language, allowing non-native speakers to look up a sound by comparison with their native tongue.

Ease of input:
- True, keyboards do not contain many of the IPA symbols. But most computers are able to display them and ~~input them~~ it will soon be possible to input them using some kind of a character selector. This is slower than using a keyboard, but people aren't going to be typing whole articles in a phonetic alphabet: usually it will be used for one or two words in an article. (edit: as Nohat said, English Wikipedia isn't in UTF-8 yet so the numeric codes are required for the time being.)
- For cases where the IPA really is impossible (an author using an ancient computer, perhaps) I would prefer the use of X-SAMPA, for the following reasons:
  - Like your scheme, X-SAMPA can be written using any keyboard and displayed on any computer.
  - X-SAMPA is designed to be as close as possible to IPA, making it easier for IPA users to learn. And there are a lot of IPA users.
  - There is a perfect mapping between X-SAMPA and IPA. So if someone writes a pronunciation in X-SAMPA, it can be easily and unambiguously turned into IPA by another editor, or by a program. ~~(I would be happy to contribute my programming expertise to Wikipedia for the implementation of such a system -- should be fairly simple.)~~ edit: looks like Nohat's way ahead of me here!

Stability:
- Your scheme appears to be under development (the pre-1.0 version numbering and the content of the original article's talk page). Reasonable enough, since it's a new scheme and doubtless people will find ways to make improvements. But if we start using it before it's stable, any changes will have to be propagated through all the articles using it. But there's no way to test it without using it. We'd need some kind of sandbox and volunteer testing corps, I think.
- IPA and X-SAMPA are already stable.

Current problems with IPA / X-SAMPA:
- Currently, the IPA and X-SAMPA pages are large and unwieldy by comparison with yours. We need a trimmed-down reference specifically for English-speakers writing or reading English pronunciations.
- It would also be useful to have readily accessible links to IPA / X-SAMPA pages on other-language Wikipediae, to make it easy for non-native English speakers to look up English pronunciations.
- Again, I would be happy to help with such efforts.

Apologies:
- I apologise if this seems rather harsh. Please be assured that I appreciate the effort you have put into your system, but I genuinely believe it to be a bad idea for the reasons outlined above.
- I also apologise for posting this screed in one chunk: I wrote it last night and didn't have time to post it before going home. Overnight, the page has grown greatly, but I'm afraid I don't have the time to try to integrate this with the rest of the discussion. Some of my points appear to have been made already by Graham and Nohat, but I believe there might still be some value left in my ravings.

Pnot 20:38, 4 Nov 2004 (UTC)

[edit] Jallan's take

I mostly agree with both Nohat and Pnot. The system is actually quite a normal one, with most of the forms being ones often used to indicate ad hoc pronunciations. I've seen similar systems used to indicate pronunciations of Biblical names on the web and for simple pronunciation indications in non-scholarly dialect phrase books . The only real oddities, to my eyes, is aa for [æ] and t~h for [ð]. Why? There's no reason any more to limit oneself to pure ASCII for computer communications. Even if there were, why not ae and dh, the latter often used as a rendering for [ð], for example by J.R.R. Tolkien in names like Caradhras and Caras Galadhon, in some systems of transliterations from Semitic languages, and by W. H. Auden and some others in translations of eddic poems, where Odhin appears instead of the more common Odin as an English rendering of Norse Óðinn and so forth. I get the impression you are not very linguistically aware or you would have used the normal dh 26-letter Latin alphabet kludge.

I suspect you are unaware of how many such systems are out there like this, ad hoc systems, often used only in one book each, all slightly different from one another, but all touting their supposed simplicity. But they are all somewhat different and all very limited. That's fine, if all you are doing is providing the standard English pronunciations of Biblical names.

But such systems look like the ASCII kludges that they are. Supposed intuitiveness is overshadowed by their ugliness and lack of scope. IT IS LIKE LIMITING ONESELF TO UPPERCASE ONLY. One is quickly fatigued by an overabundance of h. I find 'IPA far more readable. Of course, I know IPA. But then IPA is a system that should be learned. And there would be nothing wrong with an IPA guide page giving the most common IPA symbols covering English phonemes used in normal varieties of English with examples. IPA is harder to learn to your system, but not very much harder, and far more worth the learning. People really use it in the real world.

IPA mixed with simple ad hoc respellings does the job for simple indications of pronunciations. IPA is not overkill when used intelligently without making distinctions not relevant to one's purpose. The heteronym chart does not require or benefit from full indications of pronunciations. The differences should appear something like:

Heteronym	Form 1	Form 2
abstract	ab-stract – not concrete	ab-stract – to generalize
aggregate	aggre-"GATE" – to amass	aggre-"GET" – composite
discard	dis-card – to dispose of something	dis-card – an item disposed of
incense	in-cense – burned aromatic	in-cense – to make angry

Other methods like this are often used, for example ab-strAct or ab-STRACT or ab-stráct or ab-'stract. A note explaining the method used in particular case may be useful, but such methods are intuitive enough as to really need no explanation. I know IPA quite well and often use it. In this chart the aggregate samples might reasonably be rendered as aggre-[ɡeɪt] and aggre-[ɡət]. And there would be nothing wrong with using an IPA stress mark with the other forms. But the rest of the IPA usage and American markup usage only obscures. One does not understand heteronyms any better for seeing the supposed exact or approximate pronunciation for the whole word in another spelling, especially when in many cases that spelling must contain irrelevant information about a particular accent. You seem about to duplicate the information in another table for a a British accent. One could do the same for an Oxford English accent, or in a Texas accent, or in a Yorkshire accent, or in an Australian accent, or in a Scottish accent. That would provide no new information on heteronyms. But an ad hoc system using commonly understood methods, perhaps using some IPA and perhaps not, can cover all the information for all pronunciations of English in a single chart by focusing only on the differences between the two forms and nothing else. It is as though one insisted that the difference between resume and resumé requred an entire phonetic writing of both words. No. Just marking the difference is enough.

Linguists use ad hoc methods constantly to focus on particular points of interest, as well as using full IPA, or a specially tailored IPA, or only a few IPA special letters, often for different purposes in different parts of the same book. They also often use minimized (broad) phonemic representations of English and other languages. They sometimes use a single symbol for more than one phoneme when it fits the requirements of their discussion. IPA is a scalable system and a customizable system.

American English markup is cumbersome because it is not scalable or flexible. It is also not markup. I don't see why that word is being used. Also Wikipedia policies do not permit publication or original research or advocacy of one's own original research within Wikipedia. If one cannot get people outside of Wikipedia to adopt a particular system of indicating pronunciation, then why should it be used within Wikipedia in preference to systems that have been widely adopted or sensible ad hoc explanations of pronunciation.

One should be able to point to at least one dictionary that uses the system, or one pronunciation phrase book of American English, or something of that kind. One should be able to find some indication that there are a large number of people who think this system is better than similar idiosynratic systems in use in current American dictionaries or phrase books. Otherwise, Wikipedia is better using systems that have already established their usefulness, in part just because they have established it already, and in part because Wikipedia would die quickly if it encouraged all the thousands of spelling reformers, creators of new systems of writing, or phonetic transcriptions, of pictorial writing systems and so forth to use Wikipedia as a springboard for innumerable competing creations of that kind.

Jallan 04:13, 6 Nov 2004 (UTC)

[edit] jguk's take

If this is rolled out generally it will only lead to more edit wars. Are we meant to include all American, British, Irish, South African, Canadian, Indian, Pakistani, Australian, New Zealand, etc.(and bearing in mind there are different accents within all the forms of English mentioned there) pronunciations, or show a bias in favour of one form?

Is there a need? Generally no. Sometimes words have unusual pronunciations, but the way to explain these in non-IPA terms that are understandable by a majority of readers will change from article to article.

I also ask myself whether I, as a speaker of British English, think I can understand the mark-up you propose. I'm sorry, I cannot.

I appreciate that you have put a lot of time and thought into this proposal, but I'm going to have to oppose it. jguk 23:01, 6 Nov 2004 (UTC)

[edit] In conclusion

I think the discussion on this proposal seems to have come to an end. But as a footnote:

there has been a bit of an effort recently to put IPA pronunciations into articles, replacing both non-standard schemes and SAMPA. I think we can (almost) now say that IPA is the de facto standard (as well as the receommended standard) for pronunciation guides within Wikipedia

several contributions above have mentioned the lack of a short and simple key to IPA for English spellings. This now exists at IPA chart for English. rossb 09:58, 22 Apr 2005 (UTC)