Wikipedia talk:Use diacritics

From Wikipedia, the free encyclopedia

Contents

[edit] Motivation

A clear policy statement on this issue seems to be necessary, to avoid continuous repetitions of the type of discussion exemplified at WP:Requested moves/Tennis. This is only a stub of a proposal so far; I hope others will help develop it (or I will sometime when I'm wider awake). I believe the proposed statement actually reflects current practice, and certainly should continue to be the practice, since a serious reference work exists to provide serious information.—Preceding unsigned comment added by Kotniski (talkcontribs) 05:12, 8 June 2008

[edit] Discussion (no votes yet please!!)

  • This proposal doesn't really change much; see Wikipedia:Naming conventions (use English) for the use of diacritics or special characters. Ironholds 05:15, 8 June 2008 (UTC)
    • Though people seem to quote WP:UE as an argument against diacritics. Whatever the present situation is, it seems that a more explicit statement is required (see WP:Requested moves/Tennis for an example of the confusion as to what current policy is).--Kotniski (talk) 05:20, 8 June 2008 (UTC)
      • Your policy doesnt seem to really be specific, though. it comes out as you either can or cant use diacritics, although we prefer you do. there's no hard yes/no, its optional, and in a world where diacritics in a querty keyboard require fiddling with the alt key and numerical pad for hours on end i can't see it being taken up on a large scale. That being said, i think it's a nice idea, although it could be quite difficult, i.e irish names with 4/5 diacritics in the full thing. You'd also have to (most likely) have redirects for most of the articles with this policy in place so user's dont have to type it all out when searching, although i'm sure that already exists for many articles.Ironholds 05:25, 8 June 2008 (UTC)
        • Its easy enough to enter dialectics using the insertion fields beneath the edit box. Also on non-dodgy operating systems it can be a lot easier - in fact using my nice OS I can get all of the common Western European ones [é, ü, ß, å, ñ, ç, etc] using only two keys. --Neil (talk) 16:12, 8 June 2008 (UTC)
  • I think instead that the current version of conventions, Wikipedia:Naming conflicts - see the section on proper nouns, where it is written that "If a native name has a common English-language equivalent, the English version takes precedence", should be adopted as policy. Following this, it should then be clear when to use diacritics or not: They are used in article names if the subject is commonly described with diacritics in English, and not if they are not (and, of course, in any article lead, the original spelling is provided). This appears suitable for this place, the English wikipedia. --HJensen, talk 06:38, 8 June 2008 (UTC)
    • The point is that pretty much all diacriticed words can appear without diacritics in English. There are some sources (including many good ones) that simply don't use diacritics, or use only a small subset of them. There will also always be plenty of good sources that do use diacritics. So to say that one or other form is correct and the other incorrect (or "not English") is generally going to be unsupportable. We are perfectly entitled to decide which of these styles is appropriate to WP. (Of course, we don't want to punish anyone for using either form, but in order to avoid pointless edit wars and things like the tennis-player debate, we should have it set down explicitly which style is preferred.)--Kotniski (talk) 08:02, 8 June 2008 (UTC)
      • I do not purport that one or other form is "correct." The keyword in the quoted text is "common", which is completely different. And a common English-language equivalent can then be determined from case to case. --HJensen, talk 13:14, 8 June 2008 (UTC)
        • But the "equivalents" are just the names without the diacritics, not actual English names. And shouldn't Wikipedia be consistent with/without diacritic use? BalkanFever 13:33, 8 June 2008 (UTC)
          • They aren't "without the diacritics", they are "as used in reliable sources". By policy, we are mindless sheep following English usage. If the sources consistently omit the diacritics, so do we. Somedumbyankee (talk) 18:37, 8 June 2008 (UTC)
        • This is the crucial point: "common English-language equivalent" means Venice, not Venezia; Bucharest, not Bucureşti; it shouldn't mean "if diacritics are commonly stripped, keep them that way". Wikipedia is not a newspaper; there is a reason why Britannica and National Geographic retain diacritic characters in Latin-based alphabets. GregorB (talk) 16:53, 8 June 2008 (UTC)
            • No, it also means Meissen, but Göttingen; whichever English actually commonly uses. Meißen is an unEnglish and confusing as Bucureşti. Septentrionalis PMAnderson 23:05, 8 June 2008 (UTC)

Use them when the person is not known/is known with diacritics in the English world. Don't use them when the person is widely known in English without them. It doesn't seem that complex to me when you remove the WP:IDHT from it. Narson (talk) 13:39, 8 June 2008 (UTC)

(Dedent) This page/policy needs to have two sections, Article Name and Name usage. For the name usage both the Diacriticed word and the English equivalent need to be listed in bold in the article introduction. The rest of the article will use the page name. For the page name either the Diacriticed word or the English equivalent should be the page name, the other should redirect to the proper page. --Lemmey talk 13:44, 8 June 2008 (UTC)

  • In the interests of searching, redirects and linking and so on: the rule could be that page titles would be determined on a case-to-case basis on whichever is most well-known. The page itself would then contain the phrase with diacritics. Ironholds 14:01, 8 June 2008 (UTC)
  • The proposal on this page should be incorporated into WP:UE#Modified letters, since as stated this proposal contradicts that page. The 2005 Irish precedent appears to exactly follow this argument. Geoffrey Keating is a good example of what has worked previously. Somedumbyankee (talk) 15:35, 8 June 2008 (UTC)
    • Good point. Indeed, WP:UE#Modified letters states "Wikipedia does not decide what characters are to be used in the name of an article's subject; English usage does", which is perfectly consistent with my quote above from Wikipedia:Naming_conflict#Proper_nouns: "If a native name has a common English-language equivalent, the English version takes precedence". Wikipedia should therefore not decide whether some person is commonly spelled with or without diacritics. Common usage should (of which newspapers do play a defining characteristic; like it or not).--HJensen, talk 18:40, 8 June 2008 (UTC)

There is no need for this guideline. The current WP:UE guideline is neither overtly hostile to accent marks or friendly. The WP:UE guideline, follows Wikipedia Policies (WP:V, WP:NOR and WP:Naming conventions) "Use the most commonly used English version of the name of the subject as the title of the article, as you would find it in verifiable reliable sources". and "Wikipedia does not decide what characters are to be used in the name of an article's subject; English usage does. Wikipedia has no rule that titles must be written in certain characters, or that certain characters may not be used. Follow the general usage in English verifiable reliable sources in each case, whatever characters may or may not be used in them." As for names in a page there is a section in the MOS that covers it see Wikipedia:MOS#Foreign terms.

For the majority of foreign names the current guidelines are consistent with Wikipedia policy, buy there are two special categories in WP:UE where reliable third party English language sources may not be enough to determine what to use in English. The first is Divided usage in these cases if it can not be agreed what is best, then it is a good idea to put the page up for WP:RM to decide the issue (as the use of accent marks is a contentious issue). The second is No established usage in English the suggestion is to use the name in the local language. --Philip Baird Shearer (talk) 19:25, 8 June 2008 (UTC)

The "need" for this guideline (or one like it) is amply illustrated by the length of the tennis-player discussion (and other similar discussions which break out from time to time). If our current guidelines are leading to massive confusion then they are not doing their job. Imagine how much better WP would be if all the effort spent arguing were being put into editing articles.--Kotniski (talk) 08:46, 9 June 2008 (UTC)
Perhaps sources that systematically strip diacritics aren't reliable. Perhaps those who don't (Britannica and National Geographic - I've bored even myself mentioning them) are. GregorB (talk) 19:52, 8 June 2008 (UTC)
If sources are used in an article then then they are reliable (otherwise they should not be used). What is or is not reliable source is decided at policy level such as WP:SOURCES, not at a guideline level. --Philip Baird Shearer (talk) 20:08, 8 June 2008 (UTC)
OK, I'll be more direct now: why not adopt what Britannica and National Geographic are doing? They don't "source" spelling (it makes no sense, IMO); they say this: for Latin-based alphabets, use anglicized form if it exists (as in Warsaw rather than Warszawa); if not, use the original spelling. Apparently, they don't consider stripping of diacritics as "Anglicization" and don't "source" it; it makes no sense to source sloppiness. GregorB (talk) 20:17, 8 June 2008 (UTC)
We are doing exactly that, it's mostly that people are responding with WP:IDHT when someone suggests that there is an existing anglicized form. Somedumbyankee (talk) 20:20, 8 June 2008 (UTC)
GregorB Wikipedia is not Britannica and National Geographic, we have our own policies and guidelines which cover this issue. Another international publication that has a policy on this is the Economist. Presumably the editor of the Economist, not unreasonably, considers that educated English people (at least those target readers of the Economist) should be aware of the usage in the other major world languages that are written in a modern Latin alphabet (but not all accent marks in all languages): "Put the accents and cedillas on French names and words, umlauts on German ones, accents and tildes on Spanish ones, and accents, cedillas and tildes on Portuguese ones: Françoise de Panafieu, Wolfgang Schäuble, Federico Peña. Leave the accents off other foreign names. Any foreign word in italics should, however, be given its proper accents."[1]. Is the Economist any less of a reliable source than National Geographic? Now rather than follow any one style guide, it has been agreed in Wikipedia Policy that "Generally, article naming should prefer what the greatest number of English speakers would most easily recognize, with a reasonable minimum of ambiguity, while at the same time making linking to those articles easy and second nature."(WP:NC) and that we determine what that name is by using WP:SOURCES. It is really that simple. If we were to go with this proposed guideline, then we are going against Wikipedia policies (because they say use common English names and use reliable sources to determine what that is). The current WP:UE rule of using whatever reliable sources use, is a simple rule and a reasonable compromise given the difference of opinions on the use of accent marks in article titles there is between Wikipedia editors. --Philip Baird Shearer (talk) 21:03, 8 June 2008 (UTC)

Could it be that the "generosity" of The Economist towards French, German, Spanish and Portuguese spellings while ignoring others is simply based on their strong focus on economy? Does that overwrite general purpose encyclopedias' usage? Squash Racket (talk) 05:35, 12 June 2008 (UTC)

I am sure we can make up lots of reasons why the Economist Style guide is as it is, but that is not relevant, but what is relevant is when Economist correspondents follow the style guide they are not being lazy when they do not use East European accent marks. Yes we should follow usage in encyclopaedias and other reliable sources, that is what Wikipedia policies and guidelines state we should do. For each page name if reliable sources for that page use accent marks so should we and if they do not then neither should we. --Philip Baird Shearer (talk) 09:57, 12 June 2008 (UTC)
And if some do and some don't? How do we count them? Weight them? Is there really any point in doing such research for each individual name, if what is going to come out (even assuming agreement can be reached in each case) is going to be inconsistency and confusion within Wikipedia? Remember that making a good encyclopedia is more important than following guidelines, and in any case we are discussing a proposed change to the guidelines, so stating that the current guidelines say something different is a bit of a non-argument.--Kotniski (talk) 10:14, 12 June 2008 (UTC)
In that case we should be going against current Wikipedia policies. This all started with the names of tennis players. I'll be blunt: I don't care whether atptennis.com spells it "Đoković" or "Djokovic", as they are not arbiters on matters of spelling. Britannica would choose "Đoković" per policy; I haven't yet heard a principled argument as to why Britannica's policy is not good. GregorB (talk) 21:19, 8 June 2008 (UTC)
A most unlikely, and unsourced, claim. The Brittanica uses Tudjman; I don't see why they would not spell Djokovic similarly. Septentrionalis PMAnderson 21:42, 8 June 2008 (UTC)
GregorB, I notice that you have now dropped National Geographic. Was that because of the Economist? Are you sure about Britannica? What happens with your proposal if Britannica does not have an article on a person or they do not use accent marks such as Lech Walesa and Gdansk it does not seem that they do always use accent marks if the local spelling does, which is what you have suggested. BTW does this mean you would support a move of Lech Wałęsa to Lech Walesa and Gdańsk to Gdansk? --Philip Baird Shearer (talk) 21:44, 8 June 2008 (UTC)
I've dropped NG because strictly speaking it's not a reference work, and neither is the Economist. The fact that NG is "stringent" about spelling despite being "just a magazine" and not a reference work only strengthens my argument. As for Britannica, I see now that what they do is slightly odd: Tudjman and Walesa, yet Priština (exactly opposite to what Wikipedia does at moment). Different criteria for place names and personal names? (I wouldn't support that, whatever the criteria might be.) Still: if they sourced their spelling, it would certainly be Pristina then. GregorB (talk) 22:51, 8 June 2008 (UTC)
The Britannica Book of the Year uses Djokovic. Septentrionalis PMAnderson 22:57, 8 June 2008 (UTC)
Obviously, I have misconstrued Britannica's style guide because it appears to be inconsistent, as per my comment above. E.g. Britannica uses Pavel Josef Šafarík. It appears that Czech and Slovakian are on the with-diacritics list in their style guide, but Croatian and Serbian aren't. (Same with NG, incidentally.) This is again odd, as it would imply that "Š" can appear through Czech, but cannot appear through Croatian, although it is the same character pronounced in the same way. GregorB (talk) 23:23, 8 June 2008 (UTC)

One argument put forward for using accent marks is that it aids pronunciation. I think this is not as strong an argument as some proponents of accent marks like to suggest. Most accent marks are of little use to the average English reader, because even if someone knows how to read one set of accent marks (because they learnt them when they learnt that language at school), as most native English speaking people are unlikely to have learnt more than one foreign language at school, they are unlikely to be able to read other accent marks. As pronunciation a common problem in English -- for example how many English speakers know how to pronounce Mousehole or Southwark? -- we solve that problem by including IPA in articles where a pronunciation guide is useful (/ˈmaʊzəl/, and /ˈsʌðək/, locally also /sʌvək/) which seems a better way to go and less of a problem than lots of accent marks that are meaningless to most native English language speakers and are not all pronounced consistently between languages. I notice that this approach is used in the Lech Wałęsa article despite the accent marks in the article name. --Philip Baird Shearer (talk) 21:44, 8 June 2008 (UTC)

As a proponent of diacritics (more general then accent marks, I believe), I'd agree fully that "aiding pronunciation" is a weak argument. GregorB (talk) 22:51, 8 June 2008 (UTC)
I would guess that at least as many of our readers understand European diacritics as understand IPA. Also the IPA representation generally appears only in the home article - you don't get IPA representations of a name in every article in which it appears. And the fact that English pronunciation is itself frequently difficult is no reason to make pronunciation of foreign names harder than it needs to be.--Kotniski (talk) 08:29, 9 June 2008 (UTC)
Understand the diacritics of any one European language than understand IPA? Possibly, although it will depend on the language. Understand all of them, as is being called for? Most unlikely. IPA has its weaknesses too, but the best solution is to include the unEnglish form and IPA once, in the lead. Septentrionalis PMAnderson 15:49, 10 June 2008 (UTC)
In Hungarian diacritics help pronunciation. For example Zsuzsa Körmöczy becomes Zsuzsa Kormoczy when Anglicised. Squash Racket (talk) 05:59, 12 June 2008 (UTC)

I also don't believe that inconsistencies in e.g. Britannica's style are any excuse for our not trying to be consistent ourselves. The point is that both styles are acceptable in good English (as is shown by the many good sources which use one or the other, or a mix of both), it can be clearly seen that the with-diacritic form is more useful to many readers and no less useful to the rest (at least, I haven't noticed any attempt to refute that claim yet - apologies if I've missed something), so we will be improving the quality of our encyclopedia by adopting the with-diacritic style as our standard. In fact perhaps just maintaining the quality of the encyclopedia and avoiding long arguments, since in my experience the with-diacritic style is indeed the one which we currently prefer, and attempts to remove diacritics (like the tennis example) tend to fail. --Kotniski (talk) 08:39, 9 June 2008 (UTC)

Also, using diacritics (for a person's name, at least) is correct. We are an encyclopaedia (better than Britannica et al.), so just because one convention is more common in reliable sources, it doesn't mean we have to blindly use it. Especially considering the sources' reliability has nothing to do with diacritic use - they are reliable for their information. As I stated above, a name without diacritics is not an English language equivalent. Geoffrey Keating is the English equivalent of Seathrún Céitinn, whereas Seathrun Ceitinn is not. There is no German or Slovak equivalent, so the interwikis use his Irish name. (Portuguese interwiki seems to be an anomaly). BalkanFever 09:13, 9 June 2008 (UTC)
Here we come to the fundamental falsehood. Using diacritics is correct if and only if the diacritics are used in English. In many cases, this would impose diacritics where nobody uses them, including the person concerned and his wife. Septentrionalis PMAnderson 15:49, 10 June 2008 (UTC)
Another point - while using diacritics is not necessarily helpful, what does omitting them do? Using diacritics can be helpful to some people, but omitting diacritics isn't helpful to anyone. There is absolutely no benefit in not using diacritics. It causes more confusion than having them. For example: Đoković shows how to pronounce the name if you know the orthography: /dʑokovi/. Djokovic makes people who don't know the orthography think it's /dʒokovik/, and it makes those who do know the orthography think it's /dʑokovits/. (Don't worry about /dʑ/ and /dʒ/ or đ and dj, focus on c and ć) BalkanFever 09:13, 9 June 2008 (UTC)
That's nice, but what would people think if we write Djokovich? How could they possibly confuse that? /dʑokovi/ seems like the only way to read it. As for the blindly using what is used in reliable sources - let me reminded you're the one that renames articles basing it on what reliable sources (youtube etc) use. Some time ago the article Samuil of Bulgaria was renamed to Samuel of Bulgaria despite the fact that the vote was against such a move and most sources call him Samuil. So, I don't think Wikipedia blindly follows sources, but what probably is common sense. I mean, the wiki in English is the wiki written in English'. We have to make sure English speakers know what they're reading. So a compromise on the side of the slim minority that knows what Đ or ć means (even though it's different in some languages and you'd have to see where the person comes from before trying to read his name) would be pretty unfair to the majority that would prefer a more readable (whatever that means) version. --Laveol T 11:47, 9 June 2008 (UTC)
Tell me, should I ignore the fact that you seem to be blindly reverting most of my contributions, and now have "coincidentally" disagreed with me in a page you've never seen before? BalkanFever 11:55, 9 June 2008 (UTC)
Nope, actually I'm following the page from some time back ;) As you see I didn't say anything to your previous comment as I did not disagree with it. Ok, we'll try to keep things not that personal here (as I don't know who went through all Salvic mythology articles and reverted me - see, I didn't even say anything about the towns in Greee:))--Laveol T 12:20, 9 June 2008 (UTC)
Lol, actually I went through my watchlist backlog (I had one mythology article there) and then I saw that the name sections were screwed up in general, not just the ones you added to :). But yes, you did kick it all off. BalkanFever 12:38, 9 June 2008 (UTC)
I'll answer you anyway. Neither Samuil nor Samuel have diacritics, so it's not relevant. And Samuel is the English language equivalent of Samuil anyway (right now in my spellchecker Samuil is being underlined as incorrect) so it's following WP:UE, and it's perfectly fine. Omitting diacritics from a name, however, is not an English equivalent. It doesn't help English speakers at all - if you think about it, it can only misinform. "More readable"? If you don't know what it means, how will anyone else? And I'm dying to know, which articles did I rename according to youtube? BalkanFever 12:07, 9 June 2008 (UTC)
The Samuil part is only an example that Wikipedia does not blindly follow anything - as it's perfectly clear from the context that you chose to ignore. My spellchecker underlines the word Samuil, too, but it underlines the word spellchecker as well and words like neighbour because of the extra u. More readable had the stuff in brackets cause some names simply don't have a more readable version. How would you read a Hungarian name for example? My point is that diacritics can always be represented with letters that everyone understands and is able to read. You didn't say anything about the Djokovich case - don't you think a native speaker of English would find it more readable than Đoković? How's he supposed to know what these strange letters mean? --Laveol T 12:20, 9 June 2008 (UTC)
Simply because we don't use phonetic English spelling (IPA is much better for that). The argument for omitting diacritics has always been that the sources use it - and the sources don't use Djokovich. But my point is that it doesn't matter what a source chooses for the name of a person when it comes to use of diacritics: it doesn't change their name. A native speaker of English (like me, perhaps?) might not understand what the diacritics in Đoković mean, but how is he meant to know what the letters in Djokovic (specifically the c) stand for? Is it a /k/ like in English? Is it a /ts/ like in Serbian? Or is it č without the háček? Is it a ç without the cedilla? Actually, it's a ć without the acute accent. The diacritics can be represented by the basic letters, yes, but they shouldn't. Especially since the purpose of the diacritic is to differentiate from the basic letter. BalkanFever 12:38, 9 June 2008 (UTC)
English orthography only recognizes the basic letter. Insisiting on the inclusion of Ð or ç is like insisting on the inclusion of Ж or Σ or の. None of these characters exist in the standard English alphabet. Somedumbyankee (talk) 12:56, 9 June 2008 (UTC)
I hear we use the Extended Latin alphabet here, which is why a while ago the software was changed in order to include diacritics in the titles (previously it couldn't). BalkanFever 13:00, 9 June 2008 (UTC)
It's supported because the scope of the project now includes topics that don't have common English names and using the diacritics makes the most sense. You can use cyrillic and greek and kana, but they clearly aren't English usage (try searching for π or BORДT), and the guideline says to use English (generally if there isn't any English usage, it's not notable, but there are exceptions). Somedumbyankee (talk) 13:13, 9 June 2008 (UTC)
Kotniski you wrote. "I also don't believe that inconsistencies in e.g. Britannica's style are any excuse for our not trying to be consistent ourselves." We have a simple consistent rule. Do what the reliable sources do. This is in line with all the policies and most guidelines (KISS).
But reliable sources do different things. Indeed it is far from clear to what extent any particular source is reliable. If your "simple consistent rule" actually worked, we wouldn't have these endless discussions. This isn't an issue that needs to be complicated; if we work on it we can surely reach consensus for a sensible and unambiguous guideline which would be easy to apply in 99% of cases.--Kotniski (talk) 13:56, 10 June 2008 (UTC)
It does work. Disruptive nationalists ignore it. Septentrionalis PMAnderson 15:49, 10 June 2008 (UTC)
BalkanFever you wrote "Also, using diacritics (for a person's name, at least) is correct." Where did you get that idea from? AFAICT there is no such thing as correct usage in English only usage (As is shown by the compilation of the Oxford English Dictionary). If we stick to using reliable English sources for the spelling of names, (and presumably in most cases the sources used in an article are reliable sources), then we keep to Wikipedia policies and guidelines. If reliable English sources use accent marks so should we and if they do not we should not. For example the article Zürich is ünder a "ü" even though the English pronunciation of Zurich is not pronounced that way. The Zürich page has been there for a number of years, after starting out as Zurich, and it ought to be moved back as in English it is commonly not spelt with a "u" and not a "ü". The use of accent marks should not be a consideration for pronunciation (it it were then we could go with the Economist guidelines [2]), what matters is that we follow naming convention and other policies. --Philip Baird Shearer (talk) 13:40, 9 June 2008 (UTC)
I don't doubt that an average native speaker would find "Djokovic" more readable than "Đoković"; an average native speaker would also find Sports Illustrated more readable than Wikipedia. There isn't a single standard on matters of English style and usage (with respect to diacritics and otherwise): there is a continuum (and a trade-off) between maximum readability/familiarity/convenience, and maximum correctness. Everything between the extremes in that continuum is at least permissible. (And the argument "But this is not English!" does not hold.) Still, it is important to note that, as the standards go up (hopefully, in this example case, from popular weekly magazines to Wikipedia and similar reference works), the balance invariably shifts towards correctness - the question is only how far. For readability, familiarity and convenience, we might as well consult Sports Illustrated, but for correctness... And I don't think I'm being elitist here: this is a some kind of encyclopedia after all. GregorB (talk) 13:19, 9 June 2008 (UTC)
Therefore native spelling and (a possibly a pronounciation guide) is required to be included in the articles lead.--HJensen, talk 18:31, 9 June 2008 (UTC)
GregorB "Wikipedia is not a crystal ball. It is not our business to predict what term will be in use; but to observe what is and has been in use, and will therefore be familiar to our readers. If Torino ousts Turin, we should follow; but we should not leap to any conclusion until it does.(WP:UE and based on the WP:NOT policy). Use the common name (WP:NC ) and keep it simple--Philip Baird Shearer (talk) 13:55, 9 June 2008 (UTC)
I'm not saying that Wikipedia should "predict" anything, this is a distortion of my argument. That Turin should take precedence over Torino (or, say, Joan of Arc over Jeanne d'Arc) is also not disputed. Anglicized names such as these should be somehow sourced; obviously they didn't appear from thin air. My contention is this: "simply stripping diacritics" ≠ "Anglicization", ergo sourcing should not apply, etc. - my previous comment. GregorB (talk) 16:06, 9 June 2008 (UTC)
But quite frequently stripping diacritics is anglicization. There are three major entries under Dvorak, which are all originally the same Czech name: one, the composer Antonin, has come to be spelled with diacritics within the last fifty years (as has Šafarík); the others, the actress Ann Dvorak and the inventor of the Dvorak keyboard, were Americans and did not use them; nor are they now so spelt. The diacritics went the same way as the feminine Dvorakova. Septentrionalis PMAnderson 20:07, 9 June 2008 (UTC)
They may have legally changed their names, which is reason enough. There is a tennis example of this: Monica Seles. No contest there. GregorB (talk) 20:20, 9 June 2008 (UTC)
Which would require us to do original research to see if they have. Much simpler to observe that the sources for Ann Dvorak all spell her without diacritics. Septentrionalis PMAnderson 15:49, 10 June 2008 (UTC)
To PMAnderson, you mentioned one source using "Tudjman". The fact is that the "Dj" combination may still be used as an alternative to "đ" even when the rest of the letters are marked with diacritics. To everyone else, diacritcs were originally dropped either because the original printing machines were not designed to recreate them, or because the original editor was either too lazy, sloppy or just ignorant to take any notice. The fact is plain and simple: diacritics are additions - not letter replacements - they complement the grapheme, and as such, they cause no difficulty when reading. Ţō ṭáķè àñ éχãṃρłẽ, ωĥö ṝẽâłłγ ṣṭřúģģļèš ťó ŕéáđ ţħïš ??? Every character in that last sentence is alien to its plain counterpart among the 26 basic letters of English. The human brain copes with diacritics by ignoring them when it is unsure how the letter is supposed to be pronounced in the source language. As some have already pointed out, the names printed without diacritics are not transcribed into English because if they were, he who transcribed would have to do an awful lot more to match the shape with the expected pronunciation. For example, the letter c appears three times in Croatian/Bosnian etc., twice containing a diacritic and the other time without. When it is without the diacritic, it represents the sound of the "zz" in pizza, and the closest you can get to it in English is to use "ts." The other two characters (č, ć) are post-alveolars so therefore, a plain English c can never render the sounds of their Croatian counterparts. I won't go any deeper into this, but I will say one thing. Here on the free encyclopaedia, we can all write names as and how we choose. If someone should come along and ammend a name by adding a diacritic, or moving a page to the relevant name involving the diacritic, it is primitive to revert it: it brings us backwards when our purpose is to be knowledgeable, and somewhat advanced. I accept that no tennis lover can be familar with every language of the world. So if he/she wishes to use "Ivo Karlovic", then that is fine, nobody need take exception. If then one reader with a knowledge of the South Slavic written languages reads it and changes it to "Karlović", let us be grateful that he/she is aiming to improve the article quality by adding accuracy. There is yet one thing still unmentioned regarding foreign origin names and I now wish to raise it: we've discussed diacritics, and many feel that they are un-English. Then how does one react when they learn that certain features of people's names are infact, digraphs? Two letters side by side, devised to represent one single sound. "Dj" is an example of a digraph, so are Croatian/Serbian/Macedonian Lj, and Nj, just as Spanish LL and Polish Cz. Hungarian even has trigraphs, as in Dzs. If diacritics are alien, then how bad are multigraphs? Just because the average person is not aware of this does not change things, because he probably won't know how to pronounce it in the first place. Take a common Hungarian surname, such as "Kovács", take off the diacritic, you get "Kovacs". I challenge anybody with no knowledge of Hungarian to pronounce the consonants of that word as they would be in the source language. If the clever commentators (as is so often the case) try to make spectacles of themselves by showing off and giving a tennis player his/her "native sounding" name, then people will be thrown by the over-all spelling (why does Kovacs have a "C+S" if it is pronounced such and such). I ask that opponents of diacritics consider these things. Evlekis (talk) 13:28, 9 June 2008 (UTC)
Evlekis, why not simply stick to using the WP:NC policy and use the name as it is spelt in reliable sources? Then for most names we do not have to consern ourselves with whether to use accent marks or not and the policy is neither pro-diacritics or anti-diacritics instead "Wikipedia does not decide what characters are to be used in the name of an article's subject; English usage does. Wikipedia has no rule that titles must be written in certain characters, or that certain characters may not be used. Follow the general usage in English verifiable reliable sources in each case, whatever characters may or may not be used in them."(WP:UE) --Philip Baird Shearer (talk) 13:46, 9 June 2008 (UTC)
In nearly all cases, though, people can find reliable sources which use diacritics and other reliable sources that don't. Then there are endless disputes about which sources are reliable, whether their decision to (not) use diacritics was made for reasons which are relevant to us, etc. etc. Why can't we just agree to settle the issue once and for all? Either style is perfectly good English; there are good practical reasons why the encyclopedia will be that much better with the proposed style (which in practice is usually followed already); so let's just adopt that style as we have adopted many other conventions for the good of the project. (But for my view on the Dj thing, see below.)--Kotniski (talk) 16:39, 9 June 2008 (UTC)
Some guy wants to look up this awesome tennis player he saw on TV and he goes to wikipedia, but he can't find it because no one has put up the redirect yet. The overall sense of WP:Naming Conventions is that recognizability is more important than absolute accuracy. With redirects in place this may be a strange choice since people looking for Đoković under "D" will still find it (not true for a paper encyclopedia). Dead tree versions of this project have been considered, though, and where would this article go in an English paper version? I just don't see how it matters a whole lot either way for the online version, so I'm preferring to stand by what has already been decided. The alternative is to restart every single edit war that has happened over the previous incarnation of the guideline. Somedumbyankee (talk) 13:52, 9 June 2008 (UTC)
(ec) In my view Đ shouldn't be treated as just another diacritic, precisely because it is transcribed Dj instead of D, and thus hinders recognition of familiar names (and I presume that Dj is unambiguous for those who know the relevant languages anyway - correct me if I'm wrong). But this rather tricky point (trickiest when there are other diacritics like ć in the same word) shouldn't distract attention from the main issue - in the vast majority of cases, adding diacritics makes a familiar word no less recognisable even for people who are used to seeing it without them.--Kotniski (talk) 16:39, 9 June 2008 (UTC)
Searching Wikipedia through Google always works; Google apparently maps the diacritics to English letters and vice versa. Problem (or non-problem) of recognizability has been well illustrated by Evlekis in his comment above. Đoković will appear where DEFAULTSORT says: the practice thus far was to omit diacritics from sort keys (and rightly so), so no problem there, you'll find him under "D". Stare decisis? I'd say dura lex, sed lex. :-) And I foresee a lot of edit warring about it in the future. GregorB (talk) 16:23, 9 June 2008 (UTC)
The clearer we state it now, the easier it will be to deal with edit wars in the future (one side will have a clear guideline on their side).--Kotniski (talk) 16:39, 9 June 2008 (UTC)
There is a clear guideline which conforms with WP:NPOV and WP:NOR. Since there isn't much of a consensus to change it other than this WP:POVFORK of a guideline page, it remains. The guideline as is doesn't favor or oppose use of diacritics, it just says "Wikipedia is a follower and not a leader in all things, including English language usage." Somedumbyankee (talk) 23:44, 9 June 2008 (UTC)
Which is not a clear guideline at all. It remains deliberately unclear, I suspect, because of failure to obtain consensus in the past; and the fact of its lack of clarity leads to time being wasted on debates like the tennis player one. And while WP does not invent its own version of English, it does have the ability to make a reasoned choice between equally correct styles of English, and to strive for consistency within the project. (The POVFORK charge is nonsense, of course, since this is a proposal to change the wording of current guidelines - though not current practice in the vast majority of cases.)--Kotniski (talk) 07:18, 10 June 2008 (UTC)
If this is a proposal to change current guidelines, it should be on the page with the current guidelines instead of having its own page. The current guideline is plenty clear in that it states exactly what kind of evidence should be used to make these decisions. If the evidence is unclear, it falls back on native usage and the "correct" spelling. The result is not what some people want, so the response ends up WP:IDHT. Somedumbyankee (talk) 13:04, 10 June 2008 (UTC)
Well, the proposal is a bit vague as it stands, and anyway would affect several guideline pages, so I thought it would be better to start it off on a separate page and try and develop it into something concrete. Unfortunately people seem to be clinging rather unconvincingly to the status quo, claiming it is "clear" without addressing the fact that attempts to apply it often lead to massive and inconclusive discussions.--Kotniski (talk) 14:07, 10 June 2008 (UTC)

[edit] Arbitrary break

As I've stated in similar discussions in the past years, whenever the only purported difference between a "foreign" word and the "English" word lies in omitting diacritics, Wikipedia should not omit the diacritics,

  • as it does not really enhance readability to omit them,
  • as it is more correct to include diacritics,
  • as Wikipedia in my opinion should not strive to simplify its content, in general,
  • and so on and so forth.

I know that there's a faction of editors which agrees with me and an about equally large faction which vehemently disagrees, so I really don't see what we'll get out of this umpteenth repetition of this discussion...? —Nightstallion 22:39, 9 June 2008 (UTC)

Because the anti-diacritic guys say "use English" when most of the time it's not actually English. The anti-diacritic guys say do what the reliable sources do: why are the sources reliable? They are judged on their reliability by the accuracy of their information, by their neutrality (in some cases), not for their diacritic use. I there is such a thing as an authority on diacritics, it's not Britannica. It's not ATP. It's not a peer-reviewed paper on molecular biology. They don't seem to have any arguments. I'm basically summing up things that have been said by a number of guys here: purposefully omitting diacritics is going backwards. Why should the encyclopaedia stop educating? Because some guy doesn't like seeing ŵøřđş łīķè ←those? How about we just ignore all rules on this one and simply adopt and implement the guideline? BalkanFever 09:27, 10 June 2008 (UTC)
Well, it's pretty much implemented already (as I've already said, I believe it reflects widespread current practice). I don't think you can use WP:IAR to adopt new rules though (sounds a bit paradoxical, and anyway won't work).--Kotniski (talk) 14:07, 10 June 2008 (UTC)

To Philip Baird Shearer in response to me yesterday: you mentioned "reliable sources." Can I point out that that this is a presentation debate rather than a content dispute where-by Source A is seen to be more reliable than Source B for such and such a reason whilst they give diametrically opposed accounts of a given scenario (such as a political incident). Perhaps a more appropriate term would be "reputable", you can argue that the Mid-Afternoon Echo (fictional) is more reputable than Socialist Millionaire Weekly (also fictional), but again, it is one's own subjective verdict. I think what you will find is that a source will either use diacritics, or it will not. You won't for instance find the forms Slobodan Živojinović and Novak Djokovic on the same line. Such a finding would certainly move you closer to establishing that the latter really is the conventional English form, given its appearance alongside another name which includes diacritics; though I am sure that if it is ommited in one place, then it will be everywhere. And if diacritics are not used for Croatian/Serbian names, then they won't be used for any language. But going back to "reputable", yes I accept that it is a subjective term. But to ask you your view PBS, do you not consider an article which does contain diacritics as having the qualities of what you would call reputable? Do you not feel that omitting them is the hallmark of a lazy, arrogant, ignorant, "couldn't care less" attitude? In any case, I still do not see a problem with every day editors reintroducing the diacritics, as they may do in good faith. Evlekis (talk) 09:52, 10 June 2008 (UTC)

'Which they may do in good faith. Yes, they usually do, but only in the peculiar Wikipedian sense of good faith, roughly synonymous with "good intentions", which is compatible with ignorance, illiteracy, and collective self-pity. Septentrionalis PMAnderson 15:40, 10 June 2008 (UTC)
Evlekis what is or is not a reliable source is described in WP:SOURCES, and to a large extent it depends on the subject of an article as to what is reliable. If one is looking at an historical figure then the reliable sources tend to be journals and books. But if one is looking at reliable sources for popular sports men and women it tends to be "magazines ... and books published by respected publishing houses; and mainstream newspapers". For example the article on the footballer Nikola Zigic should not have diacritics, becasue the the vast majority of reliable English language sources do not use them. However the determination of the name for the artcile on the war criminal Zoran Žigić if more difficult to determine because the reliable sources are split (and include the ICTY trial transcripts) although the majority seem to favour "Zoran Zigic". "do you not consider an article which does contain diacritics as having the qualities of what you would call reputable?" The judgement on reliability has little to do with whether a source does or does not use diacritics. For example a contributor to an internet forum may or may not use not use diacritics, and the use or none use will not make the forum any more of a reliable source. "Do you not feel that omitting them is the hallmark of a lazy, arrogant, ignorant, "couldn't care less" attitude?" No, for example I see nothing wrong with The Economist's approach [3], and I don't consider the the correspondents of the Economist to to be "lazy, arrogant, ignorant or to have a "couldn't care less" attitude". --Philip Baird Shearer (talk) 13:03, 11 June 2008 (UTC)
Exactly, reliability has nothing to do with diacritic use, so it is not an argument that "most reliable sources don't use diacritics". BalkanFever 08:04, 12 June 2008 (UTC)
BalkanFever I have difficulty understanding the point you are trying to make. When looking at reliable English language sources for different articles, in most case there will be a clear indication in the sources if accent marks are used or if they are not used, in which case Wikipedia policies and guidelins indicate that we should follow the lead given by reliable sources (WP:V, WP:NOR, WP:NC and WP:UE). In some exceptional cases the person or place may be notable but there will be no English language reliable sources. In which case use the local spelling if it is in a Latin script (WP:UE#No established usage). Finally there will be some cases where the name appears in several different spellings and or with or without accent marks and there is not clear common usage. In these cases then if there is no consensus on the correct spelling of the name to use, it may be necessary to use the WP:RM procedure to decide the issue.
BalkanFever Let me give you a non accent mark example which will help to clarify the issue for you. Should we name the article about the Prussian Gerneral Hans Joachim von Zieten or Hans Joachim von Ziethen or Johann Joachim von Ziethen? In such case the name used is dertemined by looking at reliable English language sources and determining what is the most common. If not how do you think we should determine the spelling of peoples names? Exactly the same procedure is used for the use of accent marks, why do you think that accent marks should be an exception to the rule? --Philip Baird Shearer (talk) 09:32, 12 June 2008 (UTC)
If I can answer instead, accent marks should be treated differently because (a) the decision whether or not to use them in a particular source is likely to have been an editorial or stylistic one (possibly affected by technical restrictions which don't apply to us) rather than one of factual accuracy; (b) including accents makes a name no less recognisable to those who are used to seeing it without them, while dropping them reduces the encyclopedia's information content (see many arguments to that effect in this discussion and elsewhere).--Kotniski (talk) 09:42, 12 June 2008 (UTC)
Kotniski, to address your points, the first on is a guess, and if were true then we would would we have to write général and hôtle as they too are borrowed words? Or do you have some other criteria than using the content of reliable English sources to decide this issue? The second one is misleading, as we would not need national varieties of English unless people found spelling mistakes and some grammatical constructions grating. I suspect for many English speaking readers seeing Funny Foreign Squiggles on names that do not usually have them is as annoying as the word color spelt colour. And I suspect that as you are in favour of diacritics that you find it grating when they are not present. The current guidelines, of following reliable sources, is consistent with Wikipedia three major content policies, and the naming conventions (also a policy). It is also a reasonable compromise between the two poles of all or nothing when it comes to accent marks. --Philip Baird Shearer (talk) 10:49, 12 June 2008 (UTC)
I don't intend this proposal to apply to common English words like general and hotel, only to foreign names (people, places) - if that isn't clear from the wording as it stands then it certainly should be in any final version. Well, funny squiggles may annoy some and their absence may annoy others, true, but the difference between them and the "u" in colo(u)r is that they do actually add information, which is what we do. And having a situation where they are sometimes used and sometimes not, for reasons which will not be clear to the reader, is particularly likely to lead to misunderstandings. --Kotniski (talk) 11:00, 12 June 2008 (UTC)
We are talking about names of people here, not borrowed words. BalkanFever 10:58, 12 June 2008 (UTC)
Kotniski has pretty much summed it up. If you are going to base diacritics vs no diacritics in the title on the majority of reliable sources, then those sources used as evidence have to show that use diacritics at some point. Otherwise, it can only be assumed that it is a technical restriction or stylistic decision. I already knew how non-diacritic procedure works, and I support using English in a case such as the Prussian guy (everyone here does). But to repeat myself for the umpteenth time, dropping diacritics is not using English. BalkanFever 10:54, 12 June 2008 (UTC)
BalkanFever you write "But to repeat myself for the umpteenth time, dropping diacritics is not using English." is this a personal opinion, or do you have an authoritative source that backs up the statement? If what you say is true, then why is it that many reliable English language sources drop accent marks on many names? And please do not put it down to laziness or ignorance as we have discussed reliable sources that drop accent marks in some cases, either as a known editorial policy (such as in The Economist), or like Britannica (which uses some editorial criteria that remains opaque to us), the result is that both publications use Lech Walesa and are you really suggesting that they are not using English when they do so? Current Wikipedia content policies on this issue follows a policy of "English usage", a policy that is exemplified by the compilation of the Oxford English Dictionary. --Philip Baird Shearer (talk) 13:32, 12 June 2008 (UTC)
If they called him "Lewis Wales" or something that would be an English name. "Lech Walesa" is a Polish name without the diacritics, therefore an incorrectly spelt Polish name. He doesn't have an English name. Names of foreigners hardly translate. Sure, names of historical figures, but not names of contemporary sportspeople or politicians. BTW, can you tell me of a benefit of omitting the diacritics? Not "it follows the guidelines" but an actual benefit to the reader. BalkanFever 13:58, 12 June 2008 (UTC)
The benefit of using the common English spelling for "Lech Walesa" is that "Generally, article naming should prefer what the greatest number of English speakers would most easily recognize, with a reasonable minimum of ambiguity, while at the same time making linking to those articles easy and second nature." (WP:NC), and as I mentioned above the current policy is also a good compromise. BalkanFever, Your edit to this proposed guideline is already moving in the direction of the current WP:UE guideline. What is the difference between "Where the person or place has a common English name:Bucharest over Bucureşti; Geoffrey Keating over Seathrún Céitinn" different from WP:UE "Use the most commonly used English version of the name of the subject as the title of the article, as you would find it in verifiable reliable sources (for example other encyclopedias and reference works)"? --Philip Baird Shearer (talk) 18:51, 12 June 2008 (UTC)
This seems to be exactly the same question which I already answered above with (a)(b)(c) points. Basically, if you recognise Walesa you will also recognise Wałęsa, but if you recognise Bucharest you won't necessarily recognise Bucureşti.--Kotniski (talk) 21:08, 12 June 2008 (UTC)
The wording of the amendment says "Where the person or place has a common English name" tack on "in verifiable reliable sources" and that is what WP:UE says. WP:KISS. --Philip Baird Shearer (talk) 22:51, 12 June 2008 (UTC)

[edit] Tudjman

To Evlekis: every English source I have ever seen, and I followed the Balkan Wars throughout his presidency, uses Tudjman — except Wikipedia; I mentioned the Britannica because, and only because, it was the source under discussion. The suggestion that we should use a form used only by an extreme minority is contrary to the clear purposes of our naming conventions: to be intelligible to English speakers. Septentrionalis PMAnderson 20:00, 9 June 2008 (UTC)

As a personal matter, even though I recognize Tuđman as the Croatian form, it is much harder to read and to recognize than the English version; this is why English sources don't use it. Septentrionalis PMAnderson 20:00, 9 June 2008 (UTC)
I did say, dj causes no problems, it is acceptable in Croatian writing alongside the other characters with their diacritics. Evlekis (talk) 09:29, 10 June 2008 (UTC)
Exactly my point - I'm from the Balkans, but if you show me Tuđman I'll just say "What the hell is that? Who's that guy and how am I supposed to red that"? But if you show me Tudjman I'll have no problem with it. --Laveol T 20:34, 9 June 2008 (UTC)
Dj is actually used in Croatian and Serbian though, as the less correct form, if that makes sense. What I mean is in Serbian Djoković is possible, yet still correct to an extent. But going back to the English: after you're told Tuđman is Tudjman, (you will be in the article lead) you will have learned something, no? Or will you forget what the đ is each time? Using the diacritics is educational, to a degree. I'll stop using đ/dj examples now, and move on to the more clear cut. Are you going to say that Ivanišević confuses you but Ivanisevic doesn't? If you have a problem, you automatically ignore the diacritics and focus on the letters, as shown by Evlekis' sentence. Anyone can see that š is plain s with a caron (háček). If they don't know what the caron represents, they can find out, or they ignore it. But at least they know that there is a caron there. There is no benefit to removing them. BalkanFever 08:53, 10 June 2008 (UTC)
I'm saying I have a problem with Ivanišević, but no such problems with Ivanishevich. It reads plain and simple. I told you the same about Djokovich, but you simply ignored it. And again - we have to make sure that the potential reader will actually be able to read the person's name. As I said there are tons of such diacritics that are used in different context in different languages. Am I supposed to be able to read in 30 languages only to understand how the hell I should read a name since it should be in English as this is (for the third time) the English-language Wikipedia? --Laveol T 20:05, 10 June 2008 (UTC)
But this is where you yourself are applying Romanisation of Bulgarian to a non-Bulgarian name. Many people born in America swap the č for ch and š for sh, and at that point that is their name. Charles Buchinsky/Buchinski is English for Karolis Bučinskis, and of course everyone here supports that. Karolis Bucinskis simply isn't. Eveyone also supports his common English name Charles Bronson as the name of the title
We should mention Tuđman once, saying that it is the Croatian spelling, as a potentially useful fact; we may even point out that the Croatian alphabet used to spell the sound dj, before Tudjman's birth, but this is probably supererogation. We should not confuse our readers in the hope of educating them; we also betray our mission by suggesting to speakers of third languages that Tuđman is common English usage, or will be intelligible to English speakers. Septentrionalis PMAnderson 15:35, 10 June 2008 (UTC)
As for Ivanišević, we should do what English does, whatever it has come to be. "Be not the first by which the new is tried, nor yet the last by which it is laid aside." Septentrionalis PMAnderson 15:35, 10 June 2008 (UTC)
I understand the issue here, and I must concede this is a good example. It is a bit of a corner case, here "Đ" is problematic, while, e.g. "Š" might not be (as in spelling of Šafárik, which can be sourced, as already noted). But I'm against "hybrid" solutions ("Djoković"), and I'd rather go without diacritics altogether than have some per case hodge-podge solution, whereby Đoković-the-famous-tennis-player is "Djokovic", while Đoković-the-hypothetical-not-so-famous-singer is "Đoković". Let's return to the National Geographic Society's style guide for a second. Taken literally, and applied to personal names, it in fact prescribes Tudjman and Djokovic and Šafárik: Slovak language is on the keep-the-diacritics list, while Croatian and Serbian aren't. I'd prefer this (mutatis mutandis, of course) to the current interpretation of the policy. GregorB (talk) 19:41, 10 June 2008 (UTC)
Hybrid solutions are (almost entirely) against present guidance. Very little English usage accepts only some diacritics and refuses others; there may be a German example. Septentrionalis PMAnderson 20:37, 10 June 2008 (UTC)
Exactly. Still, I thought: "Hey, Đ obviously freaks people out, just like ß does!". More about Gauss follows in the pop quiz section... GregorB (talk) 21:06, 10 June 2008 (UTC)


@Laveol: But this is where you yourself are applying what you are used to (Romanisation of Bulgarian) to a Slavic name. Actually, many people born in America swap the č for ch and š for sh, and at that point that is their name. Charles Buchinsky/Buchinski is English for Karolis Bučinskis, and of course everyone here supports that. Karolis Bucinskis simply isn't. Everyone also supports his common English name Charles Bronson as the location of the article. That is what the guideline is meant for. If they have a name in English that is common. "Ivanisevic" is not a name, it is a construct born out of laziness of the first guy to write it down and the limitations of the computer of the first guy to type it. "Ivanishevich" is definitely not a name. The argument about diacritics for pronunciation isn't the strongest anyway - IPA is for that. Bulgarian and Russian Romanisation systems adopted a system equating the Cyrillic letters with the Latin (specifically English) that represent the same (almost) sounds. Czech orthography, and later Gaj's Latin alphabet (Crotian, and Romanisation of Serbian) went the other way, using diacritics. Djokovich means Ђоковицх. Ivanishevich means / ivanishevitsh / (scroll over individually). So unless the variants you mention actually exist (with Bulgarian and Russian they exist inherently, but not for others), then no. Otherwise I will let you decide what the ř in Antonín Dvořák should be rendered as all on your own. BalkanFever 08:01, 12 June 2008 (UTC)


Charles Bronson has changed his name in that form-himself !Legally. in his legal documents. And he himself used that name. he decided himself no to be Karolis Bučinskis. So that comparison with Tuđman makes no sense- Tuđman has never signed himself as Tudjman! So ,that name version has any legal validity --Anto (talk) 19:49, 12 June 2008 (UTC)

[edit] Clarity

I don't understand how the present guidance is unclear: use what your sources use, unless there is a clear demonstration that English usage as a whole differs. It may be useful to adopt that phrasing somewhere; but the intent is plan. Septentrionalis PMAnderson 20:37, 10 June 2008 (UTC)

It's not unclear as much as it is obscure. I was completely unaware that the titles like "Goran Ivanišević" go against the guidelines until the latest tennis renaming affair - and I have 54,000 edits, so I guess I would have noticed it by now. GregorB (talk) 23:50, 10 June 2008 (UTC)
It depends. If the sources say Ivanisevic and this appears to be English usage, then WP:UE supports that; on the other hand, if they say Ivanišević, WP:UE supports that. It's a question of fact, like his height and his scoring statistics. Septentrionalis PMAnderson 00:58, 11 June 2008 (UTC)
Yes, but off the top of my head I can't remember a single Croatian name with diacritics that would be supported by current WP:UE - yet they were all there with diacritics until recently. (Same with other languages/alphabets.) The guideline was apparently there all the time, but it wasn't enforced at all. GregorB (talk) 11:51, 11 June 2008 (UTC)
Are there any that made it through a formal (i.e. GA or FA) review process with article names with diacritics contrary to sources? Articles on wikipedia not following all applicable guidelines isn't surprising. Somedumbyankee (talk) 12:55, 11 June 2008 (UTC)
Almost certainly. Some articles get through FA and GA with no review of content at all. Septentrionalis PMAnderson 17:19, 11 June 2008 (UTC)
Diacritics are an issue of presentation, not content. I don't know, FA process is rather pedantic, and it's odd how noone (to my knowledge) raised the issue. If the guidline itself is clear (and I'm not saying it isn't), what caused this situation then? GregorB (talk) 18:07, 11 June 2008 (UTC)
There have always been the two schools: "All diacritics must be used because they're correct" and "All diacritics must go because none of them are English"; WP:UE contains an effort to decide between these on a case-by-case basis. Both schools ignore English usage in the effort to get their way; but the guideline usually prevails on the balance of power, and of arguments.
The tennis articles were proposed for moving by an editor who is personally of the diacritics-are-scum school, which is not yet represented in this discussion; but he's right insofar as he was moved to act by a bunch of names that he, an experienced tennis fan, had never seen before. He went too far in including Björn Borg, I think; but that is the sort of thing evidence should decide. (Not Swedish usage; English is not Croatian, and always has adapted proper names when it feels like it.) Septentrionalis PMAnderson 18:58, 11 June 2008 (UTC)

You are wrong on both counts. I've never said that "diacritics-are-scum." And I've never said that I've never seen tennis players names with diacritics. Thanks for completely misrepresenting my opinions without any basis in fact. Do you do this often? Tennis expert (talk) 04:09, 12 June 2008 (UTC)

All I know is what you have said. If you have said anything inconsistent with my interpretation, I have not seen it. Feel free to explain at length. Septentrionalis PMAnderson 18:28, 12 June 2008 (UTC)
How disingenuous of you. Either cite where I said the things you are accusing me of or strike them. Your imagination is running wild. Tennis expert (talk) 02:04, 13 June 2008 (UTC)

[edit] Pop quiz

So which of the following would you spell with a diacritic:

Replies? Septentrionalis PMAnderson 15:35, 10 June 2008 (UTC)
Are 1) and 2) trick questions? Because there are no diacritics in the titles of these two articles at German Wikipedia, and I have no reason to doubt Germans got it right.
(1) is; (2) is not: there is (minority) German usage for Göthe. But we have already had, even without this page, a good soul trying to clean up Emmy Noether on the assumption that the diacritic must be right. Talk:Emmy Noether contains much more. Septentrionalis PMAnderson 21:34, 10 June 2008 (UTC)
Gauss/Gauß is very tricky. I can observe three things: 1) speaking from experience, Croatian usage respects original spelling from all Latin-based alphabets (within technical limitations, and not too stringently applied in less formal writing), so umlauts are fine but "ß" is next to impossible to encounter in Croatian, which is rather illustrative, 2) interwiki links at Carl Friedrich Gauss are interesting - I don't see a particular logic there, and 3) even in German mainstream usage "ss" is acceptable, and there is a recent general trend of moving from "ß" to "ss" (I could be wrong here - correct me). I'd go with "Gauss". Rationale to follow, first I'd like to see other replies. GregorB (talk) 21:22, 10 June 2008 (UTC)
You are incorrect only in that the trend is not recent. English mathematical usage is invariable: Gauss, and we should go with it, rather than import other conventions. Septentrionalis PMAnderson 21:34, 10 June 2008 (UTC)

Gauß is indeed tricky insofar as he very often used his latinised name, which was also Gauss. —Nightstallion 08:33, 11 June 2008 (UTC)

It's not tricky at all... Looking at the sources in the article, it's very clear that articles in English would never use the β. Articles auf Deutsch might have a problem, but here it is unambiguous. Somedumbyankee (talk) 13:40, 11 June 2008 (UTC)
Of course it's not tricky with a guideline like this, a guideline that says "do what others do". Imagine a following Wikipedia guideline. How does one write dates and date ranges? "Do what others are doing." Should italics be used for book titles? "Do what others are doing." You'll notice nothing is tricky, everything is clear, all answers are there. This is because this is not a guideline at all. Only when you want to have an actual guideline it becomes tricky, because you have to decide what you want and devise the rules that will achieve that goal, not just copy and paste the end result. See also the example on Đoković-the-tennis-player and Đoković-the-singer. GregorB (talk) 18:24, 11 June 2008 (UTC)
Very well; let's consider that example: does English italicize book titles? Yes; look at the bibliography of any half-dozen well-printed English books: nine times out of ten all six of them will do so, and the remaining time there will be one exception. This is not hard; but it does require the ability and willingness to read English, and the patience to do what English does. We don't need to redesign the English language, and we don't need this kind of guideline. Septentrionalis PMAnderson 19:07, 11 June 2008 (UTC)
I don't understand your comment. Wikipedia guideline on the question "should italics be used for book titles?" isn't "do what others are doing", it is yes. Is this a kind of guideline we "don't need"? GregorB (talk) 20:19, 11 June 2008 (UTC)
Going back a bit, "Of course it's not tricky with a guideline like this, a guideline that says "do what others do"." That is exactly the point. It usually provides a straight answer. It defers the decision to a reliable external source (same thing we do with all other facts). "Use italics for book titles" more or less defers to every English style guide I've ever seen (except when writing longhand, where underlining is used instead). If English grammar changed on how books were cited, we would probably change too. Somedumbyankee (talk) 13:06, 12 June 2008 (UTC)

[edit] Added exception

I have amended the proposal with an exception to address the Dj/ss issue (basically I'm saying we should use dj and ss rather than Đ and ß). The wording may obviously still need work. As to the general principle of the whole proposal, I still haven't seen any convincing reason given against it. The basic argument against it seems to be that we should "follow English usage"; but English usage is divided on this matter, and I see no reason not to adopt a consistent style which will help users of the encyclopedia (this has been argued many times without apparent refutation) and is pretty much in line with well-established practice on WP.--Kotniski (talk) 08:22, 12 June 2008 (UTC)

Is ß actually a diacritic? I think I read somewhere (probably WP:UE) that it's a foreign letter rather than a modification. BalkanFever 08:44, 12 June 2008 (UTC)
I think you're right, but it should still probably be mentioned in any guideline like the one being proposed, just to make things clear to people.--Kotniski (talk) 08:59, 12 June 2008 (UTC)

I vehemently disagree with that. If the common English usage is to change ß to ss, we should do it -- however, for names which are not commonly seen in English (a street name in a German-speaking city which is not regularly the subject of English-speaking people's attention, for instance), we should keep it to be "-straße", not "-strasse". It's a question of endonym vs exonym, and simply changing ß to ss does not spontaneously create an English exonym if there wasn't one before. —Nightstallion 10:29, 12 June 2008 (UTC)

But it could also be written with an -ss- in German, right? Which means it wouldn't be a totally original invention.--Kotniski (talk) 10:44, 12 June 2008 (UTC)
It could be, but only in the case of technical restrictions. Which don't apply to Wikipedia. —Nightstallion 12:05, 12 June 2008 (UTC)
If there is no English coverage of a German street, should we have an article on it? If there is, why should we not follow the usage of our sources? Septentrionalis PMAnderson 18:26, 12 June 2008 (UTC)
Per my recent comments, I'd generally support a consistent solution over individually sourced deviations from the original spelling. So, for ß: "it is always 'ss', unless there is a particular reason to keep it". I could support the same for Đ (always "dj"), but in this case, as previously noted, other diacritics should be stripped too (e.g. "Djoković" is out of the question - again, without a good reason to the contrary, and there isn't one). "ß" is otherwise a corner case for reasons already outlined in the pop quiz section, so even the per case decisions (as suggested by Nightstallion) could be OK. As I said, ß is tricky... GregorB (talk) 13:19, 12 June 2008 (UTC)
I oppose, in general, systematic solutions. The relatively narrow deviations from usage involved in WP:NCNT, which are in general justifiable as disambiguations, are hard enough to defend.
And how would this differ from what we now do? What "particular reason" would there be to keep ß except that English sources do? (In practice, however, we will always see the argument that we have a particular reason: it's right in German.) Septentrionalis PMAnderson 18:26, 12 June 2008 (UTC)
I don't know of any "particular reason" why ß should be kept; I can't think of a single such case, but it doesn't mean it's impossible. As for systematic solutions: for the third time I must bring up Đoković-the-tennis player and the hypothetical Đoković-the-not-so-famous-singer (no English sources on him); they should either both be listed as "Đoković" or both be listed as "Djokovic" (of course, the guideline has to choose the former or the latter; this is another issue). This produces consistent results. If the "obscure" guy isn't mentioned in English sources, you are not going against usage by spelling his name in either way, so there's no harm done. GregorB (talk) 21:04, 12 June 2008 (UTC)
If Đoković-the-tennis-player becomes well known and reliable English language sources spell her name Dokovic-the-tennis-player then following the current guidelines we should use Dokovic-the-tennis-player,(WP:NC and WP:UE) while at the same time if another person Đoković-the-not-so-famous-singer (no English sources on him) then his name should remain Đoković-the-not-so-famous-singer (WP:UE#No established usage). In reality it is much more likely to be a geographic location that is notable but with few or no reliable English sources, hence we have Guantanamo Bay Naval Base and Guantánamo Province. -- WP:KISS -- Philip Baird Shearer (talk) 23:18, 12 June 2008 (UTC)

[edit] Attempting to summarize the arguments

This argument is rapidly turning into soup, so I'm attempting to understand the whole of it. I'm leaving out arguments that make little sense to me(i.e. National Geographic is an unreliable source). I'd also like to throw out that the current guideline is very rarely followed (i.e. Slobodan Milošević).

Arguments for:

  1. Using diacritics reflects the native spelling and aids pronunciation for those who know the underlying language.
  2. Including diacritics does not change whether the name can be understood by those who are not familiar with the underlying language.
  3. The current guidelines are unclear and a clear guidance will reduce disagreements.
  4. Including the diacritics is the correct spelling and following WP:UE frequently leads to problems with making wikipedia better.

Arguments against:

  1. Existing guidelines are not biased for or against use of diacritics and better reflect WP:NPOV policies.
  2. The existing guidelines rely on sources rather than the opinion of editors and potential WP:OR.
  3. Use of diacritics can cause problems depending on operating system and browser settings and they should only be used if necessary. (I, for one, am seeing a few squares where there should be characters in some of these discussions).
  4. Following common usage makes the article more accessible since English writing does not use any of these characters.

Meta-arguments:

  1. The relationship between this and existing policy is unclear. Is it designed to replace existing guidance?

Any significant arguments that I have misrepresented or outright missed? Somedumbyankee (talk) 02:55, 13 June 2008 (UTC)

I think that sums it up pretty well, nice work! I would rephrase "including the diacritics is the correct spelling", since it is hardly tenable that it is the only correct spelling, and I would also leave out (or strongly disagree with) the last of the arguments against, since "accessible" is unsupported and English writing often does use these characters. I would also add something about current practice, since diacritics do seem to be in very general use on Wikipedia, indicating that the current guidelines are probably not being applied in this matter (practice is one of the determinants of policy).--Kotniski (talk) 07:53, 13 June 2008 (UTC)
Aye, one should probably rephrase that pro point as "can be considered to be the most correct form of spelling". —Nightstallion 09:27, 13 June 2008 (UTC)
An excellent and much needed summary... One remark, though: I'd leave out WP:NPOV and WP:OR, as these guidelines don't apply here. NPOV and OR are problems with article content, not with Wikipedia guidelines. In a general sense, guidelines should be "neutral" and "not original", it's just that this is not the same neutrality and non-originality that applies to articles as prescribed by WP:NPOV and WP:OR. So the argument summary is essentially correct, only needs rephrasing. GregorB (talk) 08:30, 13 June 2008 (UTC)

Argument for: Using diacritics is in accordance with WP:UE. Guido den Broeder (talk) 09:29, 13 June 2008 (UTC)