Wikipedia talk:Naming conventions (use English)/diacritics
From Wikipedia, the free encyclopedia
Archived from Wikipedia:Village_pump_(policy) dab 14:40, 20 Nov 2004 (UTC)
See also:
- 33 #Should non-English examples be used in Wikipedia articles?
- 41 #The use of English or foreign language names of organisations
- 74 #Transliteration
- Wikipedia talk:Naming conventions (use English)#Diacritic marks in article titles
What is our preferred way of spelling foreign names in article titles and text? I know that the name common in English is standard, but what are my options if the word is not current in English at all?
The question is mostly about how to handle diacritics: Many foreign languages will have scientific transliterations that include diacritics (or, if the language is written in the latin alphabet, the native spelling itself will). Article titles only support a limited range of diacritics until we move to Unicode (for this we have the "wrongtitle" Template, see e.g. Panini (scholar). In the case of Panini, should we give the scientific transliteration once and use Panini throughout the text, or should we use Pāṇini throughout?
The question is related to disputes on Talk:Zürich and to comments I made on Talk:Islam#transliteration.2C_capitalisation.2C_diacritics.
My own take is that full diacritics should be used for the article title if possible and at least for the first occurrence in the text. On the other hand, it seems good sense to use Qur'an rather than Qurʾān and Muhammad rather than Muḥammad, so while I think we need guidelines to some extent, it will often have to be judged case-by-case. dab 17:05, 7 Nov 2004 (UTC)
- What would your rule be if there is only one referece to the word in a subsiduary text which references the main page? Does one link it to Panini or Pāṇini? See comments below about searching text with "search engines".
- I am against using diacritics -- funny foreign lines, dots and squiggles over letters -- or use foreign grammer rules to strip diacritics eg the German grammar rules to turn a ü into "ue".
- Using Zurich as an example. I am not against the start of the main page it saying
- Zurich (in Geman it is spelled Zürich, and spelled Zuerich in accordance with conversion of umlauts).
- But I am against using the forign spelling of the word in the text.
- Keeping with Zurich for the moment, recently the Second Battle of Zurich was moved to Second Battle of Zürich and all references to Zurich in the text has been changed to Zürich so a "search engine" search for pages with the text Zurich will not throw up the page! The person who changed it, highlighted the change to the Zurich page and stated that he would not be happy for the word Zürich on the battle page to change, unless the main Zurich page changed. I think that this falls under the law of unforeseen consequence.
- en passant there are some forign phrases which are commonly used in English if they have diacritics they should stay. But words like Zurich should be stripped of their umlouts.
- Here are some reason why I am against them other than as information at the top of the main page these are:
- To strip them from the letters is an easy rule to follow. It saves learning dozons of diffrent grammers to know what the likely funny foreign lines, dots, and squiggles are to be used on any particular place name.
- most English languge keyboards do not have the keys with funny foreign lines, dots, and squiggles. Accessing diacriticsis more complicated than just typing the words in without them and many (most) people using an English language keyboards, do not know how to do it.
- On the Zurich page dab suggest that this was a radical postiion to take. On the contrary forcing English people to lean about funny foreign lines, dots and squiggles above and below letters, in lots of diffrent languages is radical.
- If a search engine, like Google is used, to find a word with diacritics it tends to through up the forign pages. So WikipediA pages are lost in pages of forign text. It is silly to expect users to put on extra filters to look for words which would be found without them if only the diacritics were not included in WikipediA pages.
- If a search engine, like Google is used, without diacritics (the English language norm) then if all the words on a page are with forign lines, dots, and squiggles eg the current Second Battle of Zürich the page will not show up.
- As English is the worlds lingua franca, a considerable number of people using the English version of this Encyclopaedia will not have a European Language as their first language, or even the Latin alphabet as their main character set. Why inflict an unnecessary level of complication on them by insisting that they know all about all extra European dots lines squiggles on letters which do not exist in English?
- When editing a page the words which appear like this P& # 257;& #7751;ini instead of Panini is more difficult
--Philip Baird Shearer 19:08, 9 Nov 2004 (UTC)
- yeah, but the squiggles are there for a reason... and an encycolpedia is a place you come to to learn. dab
- but users don't need to type them. They just need browsers to display them.dab
- yeah, sorry, I am this user. Nobody forces you to learn them though. You can just 'strip' them in your mind if you like. It's not as easy to restitute them. dab
- WP is not striving for SEO, afaik... dab
- google is very good at handling diacritics. And there will be diacritic-less redirects, of course dab
- again, they do not need to know them. They are there for people who want to know. dab
- point. but then I don't think many users afraid of squiggles will feel called to edit the Panini page anyway ;) dab 20:05, 9 Nov 2004 (UTC)
I have moved your comments below mine because they broke my numbering. You made your assertione, I've made my points, you have replied. Enough said. (But just for clarity could you replace "WP is not striving for SEO" with the same phrase without the Acronyms because I do not know what SOE isPhilip Baird Shearer 17:54, 10 Nov 2004 (UTC)
- SEO. Search Engine Optimization. sorry. dab 16:02, 12 Nov 2004 (UTC)
This is an English-language encyclopaedia and it is general policy to use the forms of words used by speakers of the English language. If a word or name has diacritics in its native language, then by all means mention them in that word or person's article (in brackets at the beginning, and don't make a big deal of it), but if they aren't used when the word is written or the person is written about by English-speakers then they don't belong in article titles or the normal text of an article. Proteus (Talk) 18:19, 10 Nov 2004 (UTC)
- I think this is very typical American thinking: we might not know any German, but at least we can write the funny points and make it look more professional. These points have meaning for a German speaker — they change how the vowel is pronounced — for an English speaker they are a meaningless addition. Just remove them and add some meaningfull complexity in the contents. This goes double true for Arabic names: no matter how funny a charachter you use, English speakers won't pronounce the pharyngeal letters correctly, so why not normal letters and let the reader concentrate on something more meaningful. Gady 19:30, 11 Nov 2004 (UTC)
- how is it typically american to spell things correctly ;) ? Pseudo-professionality, see heavy metal umlaut — granted. But what if the people writing the article actually know what they are doing, and give the correct orthography? What about English speakers educated in, or learning, the language in question? Do you imply it is inconceivable that any American, or any English speaker at all, ever endeavor to learn Arabic or Chinese, and consequently would be interested in the correct spelling? I know no Chinese at all, but I am annoyed if the tone accents are missing, because that renders the transliteration worthless.
- Sorry, are your points about article titles exclusively, or also about article texts? Because if you argue about titles, and only about titles, I may concede that your view has some practical merit. But it is simply out of the question that an encyclopedia doesn't give a certain, objective, scientific, information, just because some people decide they are not interested. dab 15:25, 12 Nov 2004 (UTC)
- OK, I guess my tone was a little too anti-American. I am now spending a year in the states and it isn't doing me any good. Sorry.
- Now to your question: I meant titles and text. I suggest that e.g. Zurich is spelled like that throught, with a single mention of the German spelling in the Zurich article itself.
- Now, the rest of your argument seems to say that we should give information about the original spelling (say, in Chinese) or an accurate scientific transliteration. Of course we should, but once. Using a scientific transliteration/original spelling throughout the text and/or in the title clutters the text and makes it harder to read without giving any value.
- BTW, there is a ridiculous proliferation of Tiberian Hebrew transliterations around. These do not merit even a single mention. But that's really an isolated case. Gady 16:05, 12 Nov 2004 (UTC)
I think that for as sandard in place names Wiki should adopt the simple rule "strip the diacritics on forign name unless they are very well known on that word in English." The articles can then start with an Anglosized version followed by the local version eg:
- Zurich (German:Zürich)
- Colone (German:Köln)
- Berlin (German:Berlin)
- Rome (Italian and Latin Roma)
- Copenhagen (Danish:København)
- Sao Tome Principe (Portuguese _Príncipe São Tomé and Príncipe) ...somewhere in the text "São Tomé and Príncipe" literally translated into English is "Saint Thomas and Prince" but this is never used.
--Philip Baird Shearer 17:27, 12 Nov 2004 (UTC)
the issue of diacritics is confused here with current English vs. native forms:
- Berlin: on problem, English and native form are spelled the same.
- Rome vs. Roma: clearly not a case of diacritics, but of a form current in English
- Cologne, Copenhagen: also not an issue of diacritics, just like Rome:Roma, the native forms just happen to be spelled with non ascii characters.
- Zurich, Sao Tome Principe: these cases are different from the three cases referred to above. They are not English words, they are "stripped" native forms, originally for typographical reasons, and because there are no English forms. These are the only issue we argue about here.
dab 18:45, 12 Nov 2004 (UTC)
The rule that applies here is quite simple: don't make stuff up. If there's an accepted, well-known English spelling, than use it. If there isn't, use something that's common somewhere. Wikipedia is not a place to create neologisms, such as you would be doing by removing diacritics from words. A Japanese person might say it's simpler to replace all R's in english words with L's, since they sound the same to them anyway, but this creates a variety of ambiguities, some quite humourous, to anyone with a knowledge of English. There are perfectly good reasons to invent new mathematical notations, but we don't do that either. In short, it definitely has problems for editors, but that's nothing a little copy-paste and regular cleanup can't handle, and we shouldn't be inventing words solely for the sake of convenience. Deco 18:36, 12 Nov 2004 (UTC)
- agree, with the reservation that assessment of accepted, well-known may sometimes, unavoidably, lead to discussions. dab 18:45, 12 Nov 2004 (UTC)
- If the Japanses decide to do that on the Japanese version of Wikipedia then good luck to them. I am not inventing anything new, this is the way it has been done in English for generations and almost exclusivly since the development of the typewriter. If there is a well known English name like Lisbon or Zurich then use it, if not then use the local name with the squiggles removed. As I have said to you before it saves having to learn lots of funny forign squiggles (which mean nothing to most English speakers) and forign grammers to apply it Philip Baird Shearer 19:04, 12 Nov 2004 (UTC)
-
- And while we're at it, let's take the dots off all the i's and j's, and replace all the letter m's with n (it looks the same, except for that funny vertical line in the middle). Regardless of whether a particular symbol has significance to you, a symbol with and without a particular diacritic mark are in many languages considered two very different letters; in French, for example, sucre and sucré are two different parts of speech. To call them "funny forign [sic] squiggles" is an insult to the cultures of these people. Although there are accepted English words and names which have been constructed in the way you suggest, there are many others that retain the marks.
- I don't suggest that editors should be forced to use correct marks, though. I think editors should write the name however they like, and later editors can repair inaccurate names the same way as any other misspelling. Deco 20:43, 12 Nov 2004 (UTC)
- (Putting my foreigner hat on for a while) as a foreigner I am not offended in the least if an English speaker is not interested in Hebrew letters. Of course they look funny! Just imagine if the Tel Aviv page would be written with תל אביב in every place it should say Tel Aviv. Would that make the page better? The Zurich case is only quantitatively different, not qualitatively. Gady 20:55, 12 Nov 2004 (UTC)
-
- Tel Aviv in particular falls under terms with accepted English names. In general, languages such as Chinese, Hebrew, and Greek that have fundamentally different alphabets also typically have internationally standardized romanization systems for use in producing English versions of names. I don't object to the use of such systems — they are widely used and accepted and a lot of work and thought was put into them. Deco 07:48, 13 Nov 2004 (UTC)
-
-
- I'd say that "Zurich" in particular is an official English spelling -- we even pronounce it according to that spelling. (CIA World Factbook waffles here -- the map has Zürich, but the text has Zurich.) To an English reader, "Zuerich" is just wrong, even though it's acceptable in German if it is impossible to put the umlaut on. Given the situation, I'd say give the native spelling once, then switch to the "stripped" English spelling (which should also be the title of the page -- redirecting from Zürich, of course). Then there are smaller places, like Bad Münstereifel or Altötting. For those, I'd leave the umlaut on throughout, since there is no "official" English spelling. Mpolo 08:15, Nov 13, 2004 (UTC)
- Bad Münstereifel or Altötting: precisely! all I argue for here is that the policy leave room for these! (again, nobody is forced to learn anything by the presence of these sings. Being exposed to an umlauted character is not an equivalent of a German grammar lesson!). Zurich may be lumped with the Rome cases, no problem, or arguably with the Altötting cases, I don't care, as long as the existence of these two categories is recogniozed! dab 20:48, 13 Nov 2004 (UTC)
-
How is one supposed to know if the word is well known or not? why not just use a simple rule strip off all funny forign dots "Munstereifel" "Altotting" why complicate a simple rule from "always strip them off" to only strip them off if they are famous. How is one to know if they are famous or not? The problem is that people are forced to lean about them if they are writing pages and put in links or text searching. I say strip them and then it is simple to find them. So I say let's have a standard which would deal with theses two like this:
- Munstereifel (German:Münstereifel)
- Altotting (German:Altötting)
--Philip Baird Shearer 22:45, 13 Nov 2004 (UTC)
El Niño as is a very good example of what I mean. If you search on El Nino then the El Niño page only occurs because fortuitously one of the external links http://www.pmel.noaa.gov/tao/elnino/el-nino-story.htm has el-nino in it otherwise that page would not show up. This page should be under the name without diacritics with a link to the name in the native language:
- El Nino (Spanish:'El Niño
It is a simple rule that is easy to follow with no need to understand "funny foreign squiggles" or grammar rule, no keyboard-related inconvenience and it shows up in text searchs. If you have not been convinced by now then you will not be. So I will not say any more. Philip Baird Shearer 14:33, 14 Nov 2004 (UTC)
-
- Stripping off diacritics from non-Engish names has been one practice. Fiction writers especially tend to strip diacritics as it seems odd when only the names in a story are marked with diacritics and other words are not. Lack of diacritics on typewriters and many newspaper printing systems encouraged this. On computers a plethora of different code pages inhibited use of anything beyond the basic invariant ASCII characters in data files and in data transmission and eventually in e-mail. Even three years ago use of extended characters on the was felt to be somewhat daring. Some seem still unaware that it is possible. From Sample pages from The Cambridge Guide to English Usage, released in 2004, under the article accents and diacritics, concerning their disappearance on certain loanwords:
Their disappearance is helped by the fact that English typewriters and wordprocessors rarely have accents in their repertoire, neither does the internet.
But though the majority of older internet pages strip diacritics, other encyclopedias on the web don't follow this "simple rule", nor do millions of other English web pages, especially more recent ones. Stripping diacritics is not a norm for most non-fictional hardcopy works, at least for names pertaining to the major western European languages. The result is that names with diacritics and the same names without diacritics both occur on the web, and searching on both forms (at least in Google) is necessary to find all occurrences (though sometimes Google does properly match different spellings). I am at a loss how changing Björk to Bjork in Wikipedia would help anyone doing searches. It wouldn't. Setting Google to English sites only, the string Düsseldorf -Duesseldorf -Dusseldorf gets 2,190,000 hits, Dusseldorf -Düsseldorf -Duesseldorf gets 1,130,000 hits, and Duesseldorf -Düsseldorf -Dusseldorf gets 1,110,000 hits. Setting the spelling in Wikipedia to any one of these does not make text searches easier in Google. You will still have to use all forms to find all pages. Comparing El Nino and El Niño in the same way shows a small web preference for El Niño. Here again, a Wikipedia preference to one of these forms makes no difference worth considering. You must still search on all forms to find all pages. Stripping diacritics in Wikipedia doesn't help searches.
I can't understand Philip Baird Shearer's continued claim that using diacritics means one must understand them and also the grammar of foreign languages. All one need do is copy the diacritics or produce the diacritics when editing, and filter them out in one's mind if one wishes when reading them. No-one must understand them. But then the diacritics are there as added value for those who want to understand them and for those who already understand them and for search purposes. A large number of people have enjoyed the book The Lord of the Rings immensely without being in the least bothered by not fully understanding the diacritics on proper names, the grammar of Elvish, and the rules for pronouncing names in the Elvish languages and in Old English. It's not an issue. Monolingual English-speaking children come across the occasional name or word with diacritics when reading and aren't normally much hurt by such a thing. It's not a bad thing for even a mono-lingual child kow know that cañon is canyon and that François is not pronounced as Frank-oyz.
As to this "simple rule" being easier, of course is. It would also be easier for editors not to participate in Wikipedia at all. The point is not to do what is easiest, though ease is a good secondary consideration. The main point is to provide accurate information, including accurate information on forms normally used in English outside of Wikipedia, including standard forms of non-English names that are used in English (and to include when appropriate the native forms also when they differ). The common rule in English for academic work and for encyclopedias and guides and such when dealing with names from major western European languages is to retain diacritics except when the name has been already adopted firmly into English without them. If everyone, world-wide, spells names mostly with native spellings, then searching becomes easier rather than harder. And with increasing use of automatic translations, even those who know only English find it useful to search material on non-Engish pages.
While there is disagreement within and outside of Wikipedia about a small number of such names, in cases where both a native form and a traditional English form is in use, that doesn't invalidate the general practice. If there are diacritics on the native form of a foreign name, retaining them in English text provides useful information, especially for searching. A user can choose to strip them off for some purpose, including searching for a possible Anglicized form. But if the diacritics are already stripped off, the element of choice is gone and a user who does not already recognize the name cannot know what if any diacritics ought to appear.
Jallan 22:11, 14 Nov 2004 (UTC)
- Stripping off diacritics from non-Engish names has been one practice. Fiction writers especially tend to strip diacritics as it seems odd when only the names in a story are marked with diacritics and other words are not. Lack of diacritics on typewriters and many newspaper printing systems encouraged this. On computers a plethora of different code pages inhibited use of anything beyond the basic invariant ASCII characters in data files and in data transmission and eventually in e-mail. Even three years ago use of extended characters on the was felt to be somewhat daring. Some seem still unaware that it is possible. From Sample pages from The Cambridge Guide to English Usage, released in 2004, under the article accents and diacritics, concerning their disappearance on certain loanwords:
- Yes one can set Google to do what you suggest, and yes then you can search the way you suggest but that is not the default. With my suggestion there are no keyboard-related inconvenience. I am not suggesting that the word is not shown with it's native form. In every case I have put the word using the English character set and then the word in its original language with that language character set. With a word link El Niño how is an English person supposed to know that it has "ñ" in the middle if they have only heard is spoken and don't know Spanish or that it is a Spanish word? Philip Baird Shearer 22:47, 14 Nov 2004 (UTC)
-
- So what if they don't? If a reader is looking for it, there would always be a redirect under the version with no accents. If the article already exists and an editor is expanding it, they can copy the correct name used by other editors. If they use the wrong name, without the accents, a later editor can fix it; changing "El Nino" to "El Niño" is no different than changing "neccessary" to "necessary". The whole point of Wikipedia is that no editor has to get it right on the first try; errors and ignorance are okay. Deco 23:02, 14 Nov 2004 (UTC)
-
-
- yes, but while we agree on "necessary", there is the danger that "El Nino" will be changed to ad fro every couple of days, unless we have a clear policy how it should be spelled. dab 08:03, 15 Nov 2004 (UTC)
-
-
- that said, I don't get Philip's point at all. If you see "El Nino", and you don't know any spanish, and you have never heard it pronounced, it's just a grapheme to you anyway. So how is "El Nino" different from "El Niño" if you have no clue how to deal with it anyway? The "El Niño" variant at least gives you a hint that it is a foreign word and may not be simply pronounced /ale nine-o/. dab 23:42, 15 Nov 2004 (UTC)
-
- I understand Philip's concern in that it is quite hard to get to an article exclusively written with "squiggly forign characters" like "El Niño" if there is no redirection page from its US-ASCII representation. So I am all for making redirects from all the English transliterations for the character-set-challenged to the proper article (as Wikipedists have been doing all the time - just try El Nino or Zurich...). But please do not try to strip articles from content, just because you don't know what they mean. The squiggly marks make a big difference - in pronounciation and often in meaning - to people who know who to read them. Please remember that the English language Wikipedia is not made exclusively for people who only speak English. - Marcika 00:13, 19 Nov 2004 (UTC)
- So where is the Wikipedia for people who only speak English?
- I think User:Gadykozma had it almost exactly right: the case of "Tel Aviv" versus "תל אביב" is basically only different from the "El Nino" versus "El Niño" case quantitatively, not qualitatively. The only difference is that, as User:Dcoetzee mentioned, many languages have formal transcription systems from their native script.
- Like it or not, if a word is usually written in English without diacritics, I think we should do so here. Yes, we absolutely should mention right at the top of the article what the correct original form is - which will fulfil out goal of educating people. However, to continue to use the non-English form throughout the article, out of some desire to be hyper-correct, is to ignore that this is the English Wikipedia. Noel (talk) 15:15, 19 Nov 2004 (UTC)
- "English" Wikipedia means that we explain the terms using the English language, not that the terms themselves must be English (what is the correct English title of Tryambakam, Shri Rudram Chamakam, or Shar-Kali-Sharri? These article titles are not English anyway. They still belong in an English 'pedia, because we explain them in English)dab 15:28, 19 Nov 2004 (UTC)
What is proposed is not that the names are translated in to English, unless there are well known English translations, that would be silly, because English has always been a jackdaw language which takes any word it needs from anywhere. What is proposed is that only the 26 letters of the English alphabet are used with a foreign character translation at the start of an article if it is appropriate. It is a simple rule to understand and implement. This means that the examples you gave Tryambakam, Shri Rudram Chamakam and Shar-Kali-Sharri are fine, but the article El Nino should be under "El Nino (Spanish:'El Niño)" not under "El Niño (English 'El Nino')"!
Additionally if the name is used in another article, then at least one version of the name (preferably the first one) should be spelt using the 26 letters of the English alphabet. Then when a text search is run using no keyboard-related inconveniences all the articles in Wikipedia should show up in the search. (See Second Battle of Zürich for an example of where this is NOT done. The person who changed ALL refrences to Zürich from Zurich sited the current Zürich article to justify this!) Philip Baird Shearer 17:12, 19 Nov 2004 (UTC)
- I understand perfectly what you propose; it's just that I disagree:
- it's not how it's done at the moment: Catalhuyuk --> Çatalhöyük
- the restriction to ISO-Latin-1 characters only is purely technical and will be lifted soon
- there is no reason not to make use of Unicode titles once they are enabled, unless there is a common English spelling (the Cologne case is undisputed).
- dab 17:33, 19 Nov 2004 (UTC)
dab the article you refer to was originally "Catalhoyuk was a major neolithic city located..." someone, with views like yours changed it! If you understood perfectly why did you put up your last arguments about "Tryambakam..."? The problem is not purely technical -- have you not understood what I and others have written? To summarise, many English speaking people are not in the habit of using diacritics and many like myself think that it is better not to need to do so. You and I are not going to agree on this, but please add new arguments and don't go off on tangents like you did with Tryambakam. BTW As you live in Zurich, is English your mother tongue? If not, why do you argue so strongly for diacritics in English? Why not use the German Wikipedia where I would imagine that they are compulsory and you can insist that all German pages use forign diacritics? Or if you feel like crusading how about going to http://fr.wikipedia.org/wiki/Zurich_(ville) and insist that they use the German spelling? (The French page uses exactly what I am suggesting for here: "Zurich (Zürich en allemand). They also use http://fr.wikipedia.org/wiki/Liste_des_catastrophes_naturelles "El-Nino") I am sure they would appreciate you input :-O ) Philip Baird Shearer 18:17, 19 Nov 2004 (UTC)
- PBS, it seems like you are going on tangents here.
- My Tryambakam example had the point of illustrating that your arguments about "this is an English language encyclopedia" is beside the point here.
- this is not about German grammar either. I don't care about Zurich vs. Zürich, as I have repeatedy told you, in answer to your repeated protests that you do not want to learn German. It is for this reason that I porvided Sanskrit, Turkish, Sumerian and other examples
-
- I do not blame you for not knowing German or French at all. But if you did, you would realize that French [u] (unlike English [u]) corresponds to German [ü] phonetically. For this reason, Zurich is indeed the common French spelling for the city, and I am open to the possibility that it is also the common English one (it was not me who opposed you on that account, remember?)
- let's just agree that we disagree, and try to bring to the point our differences so there can be a vote on the issue.
- at least acknowledge that I am not alone in disagreeing with you. I had no involvement with the creation of the widely-used Template:Wrongtitle, and according to you the effort towards allowing use of UTF-8 titles would be completely superfluous (Most wikis have been converted to support UTF-8 and partial conversion of this one is complete.)
- I do not feel I am taking a radical position on this at all (not to mention 'crusading'). My vote on Zurich was neutral, and you will be completely free to use Zurich (i.e. redirects) wherever you like in any case. I feel you are exaggerating my position before shooting it down.
- regarding article titles, the options are:
- allow only ascii titles (this is your position)
- allow only ISO-Latin-1 titles (this is the status quo, for technical reasons, and everyone(?) agrees that the Latin-1 set is arbitrary)
- allow UTF-8 titles when there is no commonly used English spelling in ascii. This is my position.
- clearly, I agree that this is not the simplest way to do it, because there will inevitably be arguments about what is "common use". I do, however, argue that this is the most proper policy.
- use the native spelling throughout: this is nobody's position, so there is no need for you to argue against that.
So the question at this moment is, do we go from Latin-1 towards "ascii only", or towards full UTF-8. dab 12:20, 20 Nov 2004 (UTC)
You are STILL misunderstanding what I am writing. It is NOT primarily a technical issue. It is whether the word is written in the 26 letters of the ENGLISH ALPHABET and not using funny foreign squiggles. Although I am all for having the format:
- English Alphabet ( language:Foreign Alphabet).
BTW Zurich is anglicized (As are all German two vowel place names, with more emphasis on the first vowel and less on the second than in German eg BERlin), just as "don" in LonDON is emphasised when speaking German. Like German, English, particularly in Britain, has heavy regional accents and dialects, so there is no standard way to pronounce most words including foreign ones. In south-east London it is pronounced "Saff-east Landan" which is as great or greater difference than between the way London is pronounced in high German and received pronunciation. Philip Baird Shearer 13:46, 21 Nov 2004 (UTC)
Just in order to collect everything on this topic as much as possible in one place, I paste a discussion below fetched from Wikipedia talk:Naming conventions (city names):
[edit] How to handle names with diacritics [section moved here]
Currently there is an on-going dispute over the name Talk:Zürich. Pleas see also:
I think that for as standard in place names Wiki should adopt the simple rule "strip the diacritics on foreign name unless they are very well known on that word in English." The articles can then start with an Anglosized version followed by the local version eg:
- Zurich (German:Zürich)
- Cologne (German:Köln)
- Berlin (German:Berlin)
- Rome (Italian and Latin Roma)
- Copenhagen (Danish:København)
- Sao Tome Principe (Portuguese _Príncipe São Tomé and Príncipe) ...somewhere in the text "São Tomé and Príncipe" literally translated into English is "Saint Thomas and Prince" but this is never used.
This is not a new rule, but something that has been done in English for generations, certainly since the advent of the typewriter and probably since before the printing press. --Philip Baird Shearer 19:08, 12 Nov 2004 (UTC)
That would work well in some languages, specifically German, and less well in many other, specifically Finnish and Scandinavian (on the other hand, you may say that these languages doesn't have diacritics but extended character sets). It would also introduce an unnecessary foundation for lots of disputes, and alienate foreigners. There was a reason to do so in printed works when there were no foreign types available. But that reason is obsoleted in the computerized world.
--Ruhrjung 19:15, 2004 Nov 12 (UTC)
But it is not for several good reasons:
- Most English keyboards do not have more than 26 letters and forming any other letter is a pain.
- Unless a foreigner has an inferiority complex why should he or she care how the English spell a foreign name. Speaking as an English speaker I would not be alienate no matter how someone spelt or pronounced the city I live in. Why would a reasonable foreigner wish to impose an an English speaking person with an English keyboard the extra steps to put in characters which they do not have easy access too. I have just asked my wife is she considers it an offence when people do not know the Gaelic or use fatha in name for her home town. She replied using Anglo-Saxon words that she does not.
- Because of this English speaking people do not use search engines which automatically convert searches into other keys. If I use German and Zurich as an example. The search using Zürich is full of German pages, so then I need to filter them with an English only filter as well as working out how to get an umlaut. Why go through these two additional hoops when I can search quickly and easily using Zurich which automatically tends to filter out German pages.
- What are the Scandinavian characters which are not in the Latin alphabet and how are they traditionaly converted into Latin characters? Philip Baird Shearer 00:06, 13 Nov 2004 (UTC)
- When there are traditional English names for places, as is the case for large cities (Munich, Cologne etc.), these should be used when writing in English. By definition, these lack diacritics.
- Most smaller places, however, don't have such traditional names, as they are not often referred to in other languages. In such cases, as with personal names, the proper diacritics should be used, when there is no technical problem with that. That some English-speakers have problems writing foreign diacritics is not an argument for placing these subjects under the wrong heading, but for creating redirects or disambiguation pages when necessary. Writing as a Swede, I regard stripping diacritics from Swedish names and other words as a sign of sloppiness. Perhaps disrespect as well, especially in the case of personal names, but mostly just sloppiness. / up+land 14:05, 13 Nov 2004 (UTC)
Do many English speakers even know what diacritics are? Or how to use them (or how to use them — I only just learned how to do æ & Æ) While cultural sensitivity is nice, this is an English encyclopedia so English/Anglo versions shoudl be used. Obscure names might be worthy of diacritics, but Zurich should be left as Zurich. --ZayZayEM 14:22, 13 Nov 2004 (UTC)
- As I wrote above, I agree that English versions should be used when such exist (as with Zurich, for instance). In most cases, none do. If most English speakers know what diacritics are or not is irrelevant. Swedes usually make an effort to use Ü or Ç when writing German or French names, at least in texts intended for publication. I think English-speakers are capable of doing this with regard to other languages as well. It is not (primarily) a matter of cultural sensitivity, but of correctness. As I already pointed out, not making that minimal effort just looks uneducated and sloppy. Is that the impression you think Wikipedia should make? / up+land 14:52, 13 Nov 2004 (UTC)
There is no correct way! The customary way is to strip them in English. You think it looks uneducated and sloppy but you are not a native English speaker and althought there are English speakers who would agree with you, to me it seems fine to strip all funny foreign squiggles and lines. To strip them off is a simple and practical rule. An English speaker, or someone using English as the lingua franca of the modern age, does not have to try to guess what the correct squiggle is in a specific foreign grammar; Or how to make that character with a keyboard that does not have one; or what to use in a text search. two examples from the Wikipedia:Village pump (policy)#Transliteration:
- Munstereifel (German:Münstereifel)
- Altotting (German:Altötting)
Nice and simple! --Philip Baird Shearer 23:05, 13 Nov 2004 (UTC)
- I could just as easily claim that there is no "customary way" to do this in English, and could easily find a whole lot of published English texts which do use diacritics in foreign names or other words. I don't know if you are a monolingual English-speaker, but the fact that you may not realize is that by not using the correct spelling for these names you would be introducing a completely unnecessary ambiguity and uncertainty and creating false homonyms which could easily have been avoided. Whether you like it or not, these "funny foreign squiggles and lines" have meaning. / up+land 01:58, 14 Nov 2004 (UTC)
Yes I only speak one English language, I did not know there was more than one. Perhapse you can LEAD me to the others. As most English readers have no idea what funny forign squiggles over and under words mean (Some might know one forign language, but very few would know more than two), so how does it introduce an "unnecessary ambiguity and uncertainty" to have a simple rule which says strip them? Philip Baird Shearer 12:29, 14 Nov 2004 (UTC)
- For instance, there is at present an article on Amal, the Lebanese Muslim militia, and a different article on Åmål, a town in Sweden. There is already a disambiguation on top of the Amal article to lead anyone right who may be looking for the Swedish town. In Swedish A and Å are two different vowels, one pronounced more or less like the English A in "large", the other like the A in "fall" (or the oa in "board", or whatever). Stripping the rings from on top of the A in this case would be like exchanging the large P and B in "Philip Baird Shearer" for an R with the argument that "Rhilip Raird Shearer" still looks more or less the same. By "stripping diacritics" you are actually exchanging one letter for another.
- Another example: Angstrom is a perfectly good English word spelled without diacritics and unambiguous in any context where it would be used. But it originates with a Swedish surname Ångström, as in the 19th century physicist and Uppsala professor Anders Jonas Ångström. In this case stripping diacritics would create ambiguity, as Angström or Ängström are also possible Swedish surnames, and the latter actually a quite common one, although more often spelled Engström. (And the ending "-strom" is also possible, had the name been of German origin.) An alternative might be using oe and ae to represent ö and ä, but this creates ambiguity of a different kind, at least with personal names which may in many cases intentionally be spelled in a way differing from standardized Swedish spelling (Wärn, Waern, Wærn or Wern – this is a real Swedish surname – are all pronounced the same, but for a particular person or family the distinction may be significant).
- In either case, stripping "funny forign squiggles" creates an unnecessary loss of information. Anyone who doesn't care about this information can just happily ignore it. Personally, I don't speak Spanish, but I know that there is a difference in pronunciation between a Spanish n and a Spanish ñ. I don't see any reason why it would be advantageous to me to see "El Niño" spelled as "El Nino" and I don't see how my lack of knowledge of Spanish grammar would be a problem or why the tilde would disturb me if I didn't know the difference. Even in cases where I may not know or appreciate the difference, I don't see any reason why any "squiggles" should be stripped in writing, unless there is a real technical problem, not just a matter of keyboard-related inconvenience. I don't see my own ignorance or the fact that I may need to fire up the character map to find a certain foreign letter as a reason it shouldn't be there when I look up the word in an Encyclopædia. / up+land 13:26, 14 Nov 2004 (UTC)
If a language did not have "P" and a "B" in their alphabet then I see no problem with "Rhilip Raird Shearer" (particularly as PH is a funny combination anyway which I would not expect people not familiar with the English alphabet or similar to know how to pronounce). Thank you for highlighting El Niño as is a very good example of what I mean. If you search on El Nino then the El Niño page only occurs because fortuitously one of the external links http://www.pmel.noaa.gov/tao/elnino/el-nino-story.htm has el-nino in it otherwise that page would not show up. This page should be under the name without diacritics with a link to the name in the native language:
- El Nino (Spanish:'El Niño
It is a simple rule that is easy to follow with no need to understand "funny foreign squiggles" or grammar rule, no keyboard-related inconvenience and it shows up in text searchs. If you have not been convinced by now then you will not be. So I will not say any more. Philip Baird Shearer 14:30, 14 Nov 2004 (UTC)
Since this has ceased to be about just Cities. I think that it is best if it is discussed in one place.
GOTO: Wikipedia:Village pump (policy)#Transliteration --Philip Baird Shearer 14:49, 14 Nov 2004 (UTC)