Template talk:Unicode Latin
From Wikipedia, the free encyclopedia
A note on the fonts used in this template: the extended range and uncommon letters section call the template {{Unicode fonts}} for font family, to ensure proper displayal of the characters on user agents which cannot handle Unicode properly (i.e. Internet Explorer). The font family is followed by a trick which allows better browsers like Opera and Mozilla Firefox to do their own font matching.
See Template_talk:Unicode for more info. Jordi·✆ 08:56, 23 Mar 2005 (UTC)
- Update: it now uses {{unicode}} which invokes the CSS class "Unicode" and makes everything much easier and less complicated. HTH HAND —Phil | Talk 11:02, 20 October 2005 (UTC)
[edit] IPA letters
I have removed the recently added IPA letters from the template. The IPA alphabet is not an extended Latin alphabet, but a different alphabet which happens to be made up out of letters originating from the Latin. This template should only list letters which are actually used in the standard ortography of languages, for those languages which use an extended Latin alphabet, and letters which historically formed a part of (a version of) the alphabet. Thus yogh is in (used in Middle English and Middle Scots), but Ezh is out (used only for IPA). Jordi·✆ 00:22, 13 Apr 2005 (UTC)
[edit] LL ll
Under "alphabet extensions" the characters "ĿL ŀl" link to Ll. But "ĿL ŀl" is a Catalan digraph used to indicate a geminate /ll/, as distinct from "LL ll" that indicates /ʎ/. --Angr/tɔk tə mi 21:14, 14 July 2005 (UTC)
- Quite correct. I don't think that ĿL/ŀl has an article, but it's obvious distinct from the Spanish double L. I am going to reintroduce LL/ll and redirect ĿL/ŀl to Middle dot. Jordi·✆ 22:19, 14 July 2005 (UTC)
[edit] Half r and mufi
The {{mufi}} around the half r makes sure a Unicode font which follows the Medieval Unicode Font Initiative is used for the character, as it is not included in Unicode. It only exists in the Private Use Area of MUFI fonts. Jordi·✆ 22:23, 14 July 2005 (UTC)
- personally i'm extremely dubious about using characters that are not standard unicode. They could end up displayed as anything and the mufi template will only do anything about it if the user happens to have one of the fonts it lists. Do others agree or disagree?Plugwash 16:33, 30 October 2005 (UTC)
-
- I'm opposed to mixing private-use encoding into Wikipedia. It may be justified as a sample in a specific article, if accompanied by an explanatory note and an image for the benefit of the other 99% of readers. But it definitely doesn't belong in a template that appears in many articles.
-
- I'm also opposed to template:Mufi, which overrides my font choices. This template is only implemented for the sake of the broken font behaviour of MSIE/Windows, and is not necessary for other users.
-
-
-
- I've fixed Template:Mufi so if you have one of the fonts Cardo or LeedsUni installed (or—most probably—Alphabetum, which I don't have), it now should display the half r correctly as "" (start from MUFI for links to the fonts). However, I'm also very dubious about having a Private Use Area character in this template. -- j. 'mach' wust | ✍ 21:30, 8 November 2005 (UTC)
-
-
[edit] moving å, æ and ø
I moved å, æ and ø to alphabetic extensions, since they are treated as such in the languages were they are primarily used, i.e. Danish, Norwegian and for å also Swedish.
Moreover, they are listed at the end of the alphabet as æ, ø and å, but I'll let that be. Though, maybe IJ should be listed as I+J as well? --Salleman 13:31, 19 July 2005 (UTC)
- No IJ is not I+J. It is not a ligature just for display but it is considered a letter of its own just like œ or æ are in some languages. See IJ#Sorting for details. ---moyogo 12:17, 15 February 2006 (UTC)
But by that reasoning, we should also include ñ, which is a separate letter in Spanish...
[edit] Alphabet extensions
Isn't the letter J technically an extension of the Roman alphabet? What's Þ doing there if J isn't?
- By that standard, G and W could properly be regarded as extensions of C and U, respectively.
[edit] Extensions versus modified letters
I, being a layman and all, assume that this division of the non-English graphemes into "extensions" and "modified letters" is based on how they are treated in the langauge-specific alphabets that use them. But I find that division somewhat arbitrary, since some graphemes might be treated and sorted differently in different languages, e.g. one language might consider a certain sign to constitute a letter in its own right while another language treats it as variant of a proper letter. Moreover, is there any unambiguous definition of a diacritic? Do diacritics have to be visually separated from and smaller than the letter they modify? What about Å and ç? Just curious about these things. :} //Big Adamsky 08:30, 14 February 2006 (UTC)
[edit] Vietnamese
There are at least 20 more symbols in use in Vietnamese, combining one of the vowels (a ă â e ê i o ô ơ u ư y) with one of the five tone marks (acute accent, grave accent, tilde, dot below, hook). Should they be added as well? Some of them are already listed (since they're used for other purposes in other languages) such as ã and õ. DHN 23:17, 27 January 2006 (UTC)
[edit] Ch
Ch is considered a single letter in the Czech alphabet as well as several others- see ch (digraph). Should it be included? TheGrappler 00:01, 4 April 2006 (UTC)
- It should probably be if you look at the list, it makes sense. ---moyogo 07:45, 16 May 2006 (UTC)
[edit] The template is currently useless
It seems that people created wrong redirects from missing letters to diacritic articles (Ź now redirects to Acute accent, which doesn't make much sense). Then somebody "fixed" the redirects inside this template, making the matter worse. Should we just fix the template, or also get the redirects deleted? Zocky | picture popups 00:43, 23 April 2006 (UTC)
- "(Ź now redirects to Acute accent, which doesn't make much sense)." Why doesn't it make sense? "Ź" is not a letter; it is a letter with a diacritic mark above it, so it's not likely ever to have its own article. And È, for instance, is now a disambiguation page, making things even worse. I suggest that either the symbols that are not themselves letters be de-linked, or linked to a relevant article about the diacritic or other mark used in forming them. --Russ Blau (talk) 17:17, 14 June 2006 (UTC)
-
- Ź is a letter in Polish. --Ptcamn 02:15, 15 June 2006 (UTC)
- and Lower Sorbian.--Hello World! 15:16, 11 August 2006 (UTC)
- Ź is a letter in Polish. --Ptcamn 02:15, 15 June 2006 (UTC)
[edit] accented characters?
What's the point of having accented characters? This table will end up being really, really big. ---moyogo 19:39, 15 May 2006 (UTC)
[edit] Remove non-Unicode characters
Under "alphabet extensions", the first item (Ɑɑ) starts with U+2C6D, which is not part of the Unicode standard , and does not appear to be even proposed. A later item (Ɽɽ) contains U+2C64, also not in the standard—it appears to be proposed for Latin Extended-C (PDF).
These characters can be mentioned or have samples in the appropriate articles, preferably along with images to show what they should look like. But this template which appears as a standard navigation element in dozens of articles should not have non-standard characters, which are practically guaranteed to always be broken (and technically, invalidate the HTML code).
I'm removing these. —Michael Z. 2006-05-18 04:22 Z
I added U+2C64 since it has been accepted in Unicode ver5.0. --Hello World! 15:22, 11 August 2006 (UTC)
[edit] Grid alignment
This could still use a bit of improvement, and there would be much less code if the styles were put into common.css, but doesn't it look neater? —Michael Z. 2006-05-18 05:54 Z
Latin alphabet |
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz |
---|---|
Modified characters |
Àà Áá Ââ Ää Ãã Āā Ąą Ăă Ǎǎ Ḅḅ Çç Ĉĉ Čč Ćć Đđ Ďď Èè Éé Êê Ëë Ęę Ēē Ĕĕ Ėė Ěě Ĝĝ Ğğ Ġġ Ģģ Ǧǧ Ĥĥ Ħħ Ìì Íí Îî Ïï Įį ı İ Ĩĩ Īī Ĭĭ Ĵĵ Ķķ Ǩǩ Ĺĺ Ļļ Ľľ Ŀŀ Łł Ńń Ņņ Ňň Òò Óó Ôô Öö Õõ Őő Ǫǫ Ōō Ŏŏ Ơơ Ŕŕ Ŗŗ Řř Śś Ŝŝ Şş Șș Šš Ťť Ŧŧ Ţţ Țț Ùù Úú Ûû Üü Ũũ Ūū Ŭŭ Ųų Ůů Űű Ưư Ŵŵ Ýý Ŷŷ Ÿÿ Źź Žž Żż |
Alphabet extensions |
Ȁȁ Ȃȃ Ææ Ǽǽ Ǣǣ Åå ɑ Ɓɓ Ċċ Ðð Ɖɖ Ɗɗ Ɛɛ Ȅȅ Ȇȇ Əə Ǝǝ Ƒƒ Ǥǥ Ǧǧ Ɠɠ Ƣƣ Ɣɣ Ƕƕ Ǐǐ Ȉȉ Ȋȋ Ǩǩ Ƙƙ ĸ Ññ Ɲɲ Ŋŋ Œœ Øø Ǿǿ Ǒǒ Ȍȍ Ȏȏ Ɔɔ Ȣȣ Ȑȑ Ȓȓ ɽ R̵r̵ ß Ʃʃ Ǔǔ Ȕȕ Ȗȗ Ƿƿ Ȝȝ Ȥȥ Ƶƶ Ʒʒ Ǯǯ Þþ |
Digraphs |
DZ dz DŽ dž GB gb IJ ij KP kp Lj lj LL ll ĿL ŀl MB mb MP mp ND nd NG ng NJ nj NK nk NS ns NT nt NZ nz |
Trigraphs | |
Stylistic variants | |
edit |
[don't forget the <noinclude> code]
- Nice. Would it be possible to put the characters just the tiniest bit closer? "Yy Zz" is displayed on a second line on my system (1024x786, but obviously the font matters as well). It would be preferable if the Latin characters only took a single line on the majority of systems. —Ruud 17:05, 18 May 2006 (UTC)
Latin alphabet |
Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Ll Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz |
---|---|
Modified characters |
Àà Áá Ââ Ää Ãã Āā Ąą Ăă Ǎǎ Ḅḅ Çç Ĉĉ Čč Ćć Đđ Ďď Èè Éé Êê Ëë Ęę Ēē Ĕĕ Ėė Ěě Ĝĝ Ğğ Ġġ Ģģ Ǧǧ Ĥĥ Ħħ Ìì Íí Îî Ïï Įį ı İ Ĩĩ Īī Ĭĭ Ĵĵ Ķķ Ǩǩ Ĺĺ Ļļ Ľľ Ŀŀ Łł Ńń Ņņ Ňň Òò Óó Ôô Öö Õõ Őő Ǫǫ Ōō Ŏŏ Ơơ Ŕŕ Ŗŗ Řř Śś Ŝŝ Şş Șș Šš Ťť Ŧŧ Ţţ Țț Ùù Úú Ûû Üü Ũũ Ūū Ŭŭ Ųų Ůů Űű Ưư Ŵŵ Ýý Ŷŷ Ÿÿ Źź Žž Żż |
Alphabet extensions |
Ȁȁ Ȃȃ Ææ Ǽǽ Ǣǣ Åå ɑ Ɓɓ Ċċ Ðð Ɖɖ Ɗɗ Ɛɛ Ȅȅ Ȇȇ Əə Ǝǝ Ƒƒ Ǥǥ Ǧǧ Ɠɠ Ƣƣ Ɣɣ Ƕƕ Ǐǐ Ȉȉ Ȋȋ Ǩǩ Ƙƙ ĸ Ññ Ɲɲ Ŋŋ Œœ Øø Ǿǿ Ǒǒ Ȍȍ Ȏȏ Ɔɔ Ȣȣ Ȑȑ Ȓȓ ɽ R̵r̵ ß Ʃʃ Ǔǔ Ȕȕ Ȗȗ Ƿƿ Ȝȝ Ȥȥ Ƶƶ Ʒʒ Ǯǯ Þþ |
Digraphs |
DZ dz DŽ dž GB gb IJ ij KP kp Lj lj LL ll ĿL ŀl MB mb MP mp ND nd NG ng NJ nj NK nk NS ns NT nt NZ nz |
Trigraphs | |
Stylistic variants | |
edit |
A width of 2.3em does the trick for me, but the one above has a base width of 2.25em. —Ruud 17:17, 18 May 2006 (UTC)
- This example works fine here (Safari 2.0.3, both with and without a user style sheet applying Lucida Grande font). Also brings the 26 letters down to one line when the window is at full width. —Michael Z. 2006-05-18 18:50 Z
[edit] Modified vs. Extensions
I really think that the modified and extensions part should be merged, or split in a more technical way. The current way has two main faults that come to mind:
- What's considered an accent in one language is an alphabet extension in another. e.g. ö is a separate letter sorted separately from 'o' in Swedish, an umlauted 'o' in German where they're vital for not only pronunciation but for meaning too and a diaraesis in French and (rarely) English indicating two separate syllables, but do not usually affect the meaning. õ is a separate letter in Estonian but not in Portuguese. There are plenty more examples.
- Listing them by whether or not a particular language counts it as a separate letter is confusing, why should č be a modified letter when ǒ is an extension? In order for the current lists to make sense, you need to know which language we're using to classify it and whether they consider the character in question to be a separate letter of the alphabet. I understand that native speakers of, for example, Swedish would expect to find ö as a separate letter, but we can't possibly cater for every language in the world. This is en.wikipedia.org, so why not present information from an English point of view?
I personally prefer a more technical split (along with the proposed digraph/trigraph split above) because the lists are already long enough without making them longer. Avxxv 00:21, 20 May 2006 (UTC)
- Agreed. --Ptcamn 01:41, 20 May 2006 (UTC)
- Makes sense.
- What about splitting those two sections by the graphical features of the glyphs, rather than their status in a particular alphabet? Do the following classifications make any sense? Are all of the last one actually non-Latin in origin?
- Added diacritic
- Àà, Áá, Ââ, Ää, Ãã, Āā, Ăă, Ǎǎ, Ȁȁ, Ȃȃ, Åå, Ḅḅ, Ĉĉ, Čč, Ćć, Ċċ, Ďď, Èè, Éé, Êê, Ëë, Ēē, Ĕĕ, Ėė, Ěě, Ȅȅ, Ȇȇ, Ĝĝ, Ğğ, Ġġ, Ģģ, Ǧǧ, Ǧǧ, Ĥĥ, Ìì, Íí, Îî, Ïï, Ĩĩ, Īī, Ĭĭ, Ĵĵ, Ǐǐ, Ȉȉ, Ȋȋ, Ķķ, Ǩǩ, Ǩǩ, Ĺĺ, Ļļ, Ľľ, Ŀŀ, Ńń, Ņņ, Ňň, Ññ, Òò, Óó, Ôô, Öö, Õõ, Őő, Ōō, Ŏŏ, Ǒǒ, Ȍȍ, Ȏȏ, Ŕŕ, Ŗŗ, Řř, Ȑȑ, Ȓȓ, Śś, Ŝŝ, Șș, Šš, Ťť, Țț, Ùù, Úú, Ûû, Üü, Ũũ, Ūū, Ŭŭ, Ůů, Űű, Ǔǔ, Ȕȕ, Ȗȗ, Ŵŵ, Ýý, Ŷŷ, Ÿÿ, Źź, Žž, Żż,
- Added attached element
- Ąą, Ɓɓ, Çç, Đđ, Ɖɖ, Ęę, Ǥǥ, Ɠɠ, Ħħ, Ƕƕ, Ƙƙ, Įį, Łł, Ɲɲ, Ŋŋ, Ǫǫ, Ơơ, Øø, Ǿǿ, ɽ, R̵r̵, Şş, Ŧŧ, Ţţ, Ųų, Ưư, Ȥȥ, Ƶƶ
- Modified form or ligature
- Ææ, Ǽǽ, Ǣǣ, ɑ, Ðð, Ɗɗ, Ɛɛ, Əə, Ǝǝ, Ƒƒ, ı, İ, ĸ, Œœ, Ɔɔ, Ȣȣ, ß,
- Non-Latin origin
- Ƣƣ, Ɣɣ, Ʃʃ, Ƿƿ, Ȝȝ, Ʒʒ, Ǯǯ, Þþ
-
- It can be hard to actually identify which slot it should go in. In particular:
- Cedillas often alternate between an attached hook and a free comma/apostrophe-like thing, both between languages (Romanian prefers commas, Turkish prefers cedillas) and within languages (Printed Latvian should use an apstrophe above ģ, but handwritten Latvian may use a cedilla below it).
- Ç is originally not c + diacritic at all, but a modified z (specifically a visigothic z).
- Should ð really count as modified, rather than just an attached element? It's really just an added stroke to a particular stylistic variant which was normal for d at the time (speciifcally insular d).
- Is ɛ a modified Latin e, or is it a Greek epsilon? This one is ambiguous.
- Is ĸ a modified Latin k, or is it a Cyrillic ka?
- Is ȣ a ligature of ou, or a modified 8, or is it the Greek omicron-upsilon ligature? (I suspect the first, personally.)
- ʃ has a half-and-half origin: its lowercase is an italic long s, its uppercase a Greek sigma.
- Ʒʒ similarly: its lowercase is a modified z, but its uppercase (one of them, anyway) is a reversed Greek sigma.
- Ȝȝ is originally a modified g (specifically an insular g). Ƣƣ is a modified q according to the article, but I don't know about that one.
- G is a modified C, U is a modified V, J is a modified I, W is a VV ligature, and Y and Z have a non-Latin origin..
- So as you can see, it's complicated. Personally I think it would be cool if we could organize them by date of creation: the initial Latin alphabet, then G Y Z, then U J W, then the earliest diacritics, ... until finally the modern inventions by linguists. But I don't think we yet have enough information about the origins of all the letters to do that, unfortunately. --Ptcamn 19:54, 20 May 2006 (UTC)
- It can be hard to actually identify which slot it should go in. In particular:
-
-
- All good points. I think many of them can be resolved (or rather, ignored), if we keep to a morphological definition, rather than an etymological one. Simply classify them by how they look, and don't get into each letter's usage or history, which, as you point out, can be difficult to determine or even contradictory. I would say go by the typeset appearance of letters rather than by their hand-written form. Where there is still a conflict or confusion, just pick a category which will make it easier for the non-expert reader to find the letter. —Michael Z. 2006-05-22 04:12 Z
-
I'd keep the split much simpler. One class for the 26 letters of the Latin alphabet, one for the 26 letters with any form of diacritic/attachement. One or two classed for the di- and trigraphs. One class for the what is left. —Ruud 20:24, 21 May 2006 (UTC)
The weblog of Andrew West said that “Ƣƣ represent the letter "gha" used in the Kirghiz Latin alphabet between 1928 and 1940” --Hello World! 15:02, 11 August 2006 (UTC)
[edit] Addition of Cyrillic letters
Cyrillic ze (Cyrillic) (Зз) and che (Cyrillic) (Чч) were recently added, but their articles don't mention any use of these letters in Latin alphabets. Do they belong here? —Michael Z. 2006-05-22 04:30 Z
- Zhuang's old orthography used them to mark tones. Its alphabet is otherwise all-Latin (okay, ƃ looks a bit like б, but ƌ doesn't have any Cyrillic equivalent), so I think it should be considered as a case of borrowing from Cyrillic to Latin. --Ptcamn 04:42, 22 May 2006 (UTC)
- On second thought, it might be justified to create separate articles (ze (Latin) and che (Latin), or tone three (letter) and tone four (letter) for З and Ч's Latin usage. It's what's already been done with other letters (e.g. J vs. Je (Cyrillic)). --Ptcamn 04:49, 22 May 2006 (UTC)
-
- I would say add that information to the Cyrillic letter articles, as well as redirecting from the suggested alternate titles. These are obscure and may remain as mini-stubs, so they may as well get a little more attention by piggy-backing on the short Cyrillic letter articles. The Cyrillic Je is a different case, because the Latin letter article is quite long, but all the Cyrillic letters have their own articles too. —Michael Z. 2006-05-24 01:15 Z
[edit] digrahps/trigraphs
The digraphs and trigraph section of this template have a problem.The names "digraph" and "trigraphs" are not correct: we want to include only letters which look like di-/trigraphs. Not actual di-/trigraphs like sch (trigraph). Dutch alone has more than a dozen digraphs, of which only one (IJ) is sometimes considered a single letter. —Ruud 19:11, 21 July 2006 (UTC)
- Why? --Ptcamn 19:39, 21 July 2006 (UTC)
-
- 1) di-/trigraphs are not part of any alphabet. 2) there are probably too many di-/trigraphs to list them all in this template. —Ruud 19:42, 21 July 2006 (UTC)
-
-
- Hell, there's too many monographs to list them all in this template. --Ptcamn 19:45, 21 July 2006 (UTC)
-
-
-
-
- Maybe, but I don't see how that problem relates to this one? —Ruud 19:49, 21 July 2006 (UTC)
-
-
[edit] shorten or unlink!
this template is unbearably long and extremely ugly, and of very limited utility. It should either be shortened substantially, or it should only be inserted in a few selected places, not on every article linked from it. dab (ᛏ) 21:09, 29 July 2006 (UTC)
- Would making all but the top section hidden by default, with a link to reveal it, help, or is it the actual size in bytes that's the problem? --Ptcamn 22:55, 29 July 2006 (UTC)
[edit] If we are including all this stuff...
... how about kh (yes, that underline is part of the character) and w (yes, that superscripting is part of the character) in the standard transcription of several native languages of the Pacific Northwest? - Jmabel | Talk 18:59, 31 July 2006 (UTC)
- If they're part of the character, you should use Unicode for them rather than markup: k̲h̲ and ʷ.
- Is ʷ actually recognized as a letter, or is it a sort of diacritic (e.g. kʷ might be one letter, sorted after k)? --Ptcamn 10:30, 6 August 2006 (UTC)
[edit] Ligatures
I think including purely typographic ligatures like ffi is going a bit far. Unicode doesn't even recommend that the characters be used. --Ptcamn 19:15, 12 August 2006 (UTC)
- my fault...--Hello World! 12:41, 13 August 2006 (UTC)
"a bit far?" this template needs to be cut down to a fraction of its present size, urgently! heavens, this is just relevant to Latin Unicode characters, apart from the (let's face it, completely arbitrary) decisions of which precomposed characters to encode in unicode, the set of characters listed here makes no sense whatsoever. What we need at {{Latin alphabet}} is something like we have at {{Arabic alphabet}}, listing the actual letters, not every conceivable combination of diacritics. If we must have something like this, move it to {{Unicode Latin}} or similar. dab (ᛏ) 19:19, 20 September 2006 (UTC)
- {{Arabic alphabet}} isn't actual letters. ج and خ are just ح with diacritics, ظ is ط with a diacritic, ش is س with a diacritic, etc. Of course, they're considered distinct letters in Arabic, but then so are diacritic combinations in the Latin alphabet. --Ptcamn 19:34, 20 September 2006 (UTC)
- I see you are not familiar with the history of the Arabic alphabet (the dots are, rather, comparable to the dot acquired by Latin i over time). Not passing judgement on "diacritic combinations count as separate letters", I'll just try to separate this into several templates for practical reasons. dab (ᛏ) 13:55, 4 October 2006 (UTC)
- see {{Unicode Latin}}, {{Digraphs}}, List of Latin letters. dab (ᛏ) 14:52, 4 October 2006 (UTC)
-
-
- I agree with dab: This template is far too big! Even the tiniest stub will be oversized if this template is included because this template is already oversized as is, see Wikipedia:Article size. The markup is far too long. The inclusion of letters that aren't found in any fonts yet is prone to produce errors (I hope I've fixed them now). I don't think there's any need for this oversized template at all. ― j. 'mach' wust | ⚖ 16:35, 4 October 2006 (UTC)
-
[edit] The recent changes
- Why is this called "Unicode Latin" when it includes things Unicode does not actually have characters for like P with tilde, N-diaeresis, and R rotunda? Pace Dbachmann, this wasn't just "the (let's face it, completely arbitrary) decisions of which precomposed characters to encode in unicode".
- Why does the "phonetic symbols" section include characters that are not actually phonetic symbols, including German's ß, Zhuang's Ƨƨ, Ƽƽ, and Ƅ ƅ, Norwegian's Ææ, Middle English's Ȝȝ, Icelandic's Þþ...
--Ptcamn 02:07, 5 October 2006 (UTC)
In the previous edition of the template we didn't differtiate between “phonetic symbols” and “letter with diacritics” because in different language we have different definitions. --Hello World! 14:30, 6 October 2006 (UTC)