Talk:ISO/IEC 8859-1
From Wikipedia, the free encyclopedia
Contents |
[edit] windows-1252
Why does windows-1252 redirect to this page? 1252 is NOT the same as iso-8859-1. Redirecting references to windows-1252 to this page (I think) reinforces the mistaken impression that 1252 and 8859-1 are the same (and they are most definitely not). Perhaps this deserves an entire page, but Microsoft's loose labeling of email with (close but not exact) MIME character sets is a BIG problem.
- You're free to make the relevant section a (linked) article of its own. The section however tries to make it clear that although both, CP1252 and Latin-1, are supersets of ISO/IEC 8859-1, they differ in 8x and 9x area. The German version takes a little different approach in emphasizing the differences.
-
- The advantage of keeping CP-1252 and MacRoman on the ISO-8859-1 page is that differences can be made more clear. And Windows (and IIS) misreports CP-1252 as ISO-8859-1 by default, so most people falsely assume CP-1252 *is* ISO-8859-1. Latin-1 of course is a valid implementation of ISO 8859-1, as it is nothing but an alias for ISO-8859-1. Jor 13:24, 12 Mar 2004 (UTC)
-
-
- Latin-1 of course is a valid implementation of ISO 8859-1, as it is nothing but an alias for ISO-8859-1. Wrong. ISO/IEC 8859-1 (the standard) only specifies the characters for the 20-7E and A0-FF byte ranges. The characters for bytes 00-1F and 7F-9F are left undefined. The ISO-8859-1 character map registered with the IANA fills the missing spots with the C0 and C1 control sets (defined elsewhere), thus covering 00-FF. This map's approved aliases are: ISO_8859-1:1987, iso-ir-100, ISO_8859-1, ISO-8859-1 (preferred MIME name), latin1, l1, IBM819, CP819, and csISOLatin1. - mjb 22:07, 12 Mar 2004 (UTC)
-
-
-
-
- So remove the dash. Latin1 *is* a valid alias for ISO-8859-1, which is an encoding based on ISO 8859-1. Jor 22:17, 12 Mar 2004 (UTC)
-
-
-
-
-
-
- like it or not in the internet world now ISO-8859-1 is always interpreted as windows-1252. stuff from windows-1252 was being used by editors here with no reported problems all the time before the switch to utf-8. Sometimes you just have to accept that formal standards and reality aren't the same thing. Plugwash 6 July 2005 10:53 (UTC)
-
-
-
[edit] correct quotation marks
In this respect ISO-8859-1 was't worse than a typewriter, or am I mistaken? So, only special characters for typesetting are missing from ISO--8859-1 (also ligatures, like ff, fi, ...). Pjacobi 23:02, 17 Sep 2004 (UTC)
- You're not making yourself very clear, please elaborate. -- Ævar Arnfjörð Bjarmason 23:42, 2004 Sep 17 (UTC)
As far as I know, mechanical or electrical typewriters, didn't provide symbols for typesetting either. They are or were lacking the different quotation marks, different length (typographic) hyphens, ligated versions of "ff" etc. Not that it would make much sense on a monospacing machine. Only the specialised input machines for typesettings did have all those.
So, when setting coded character sets for use on computers, I don't think that the lack of those signs can be viewed as not supporting any languages which would use these signs in typesetting. Typesetting was done using specialised markup, shortcuts or automatic conversion (as done by troff).
Even today, Unicode considers some typesetting related issues to be out of scope for coded character sets, a step back from the Adobe approach, which even did put some "ff" ligatures in the "Expert Character Set".
In summary, I consider the remark "missing correct quotation marks" to be slightly misleading.
Pjacobi 12:51, 18 Sep 2004 (UTC)
- In German (and I believe other languages as well) it is considered an orthographic error to use something like "speech" instead of „speech“, although »speech«, of which Latin1 is capable, can be acceptable, too. Width of dashes/hyphens and interword/intercharacter spaces is a different story completely. Crissov 18:40, 18 Sep 2004 (UTC)
-
- No it is considered bad typography. In handwriting or as typoscript, it is perfectly acceptable. -- Pjacobi 19:35, 18 Sep 2004 (UTC)
-
-
- Especially in handwritten German it is not acceptable, ask any German teacher or the Duden. You could compare it to ")foo)" or "(foo(", which no-one would claim correct. It can only be acceptable in technically limited environments. In English "foo" and “foo” are propably similar enough.
-
-
-
-
- The Duden is no authority on glyph shapes. Using the " as glyph shape for both punctation characters 'german begin of direct speak' and 'german end of direct speak' is no othographic error, at least not in Hamburg, Germany. Likewise the Duden doesn't regulate the glyph shapes for small "s" and "t", where there is much variation in German handwriting. Pjacobi 18:16, 19 Sep 2004 (UTC)
-
-
[edit] Mac-Roman
Is it really best to remove the comparative chart between Mac-Roman and ISO-8859-1? Is it really true that Mac-Roman has no relation to ISO 8859-1 or ISO-8859-1? Also, there appears to be hyphenation differences throughout this page (e.g. MacRoman vs. Mac-Roman, CP1252 vs. CP-1252, etc.). GPHemsley→◊ 00:50, Mar 29, 2005 (UTC)
- The Macintosh Roman character sets, Mac-Roman and MacRoman, both inherit the ASCII characters, but have nothing else in common with ISO-8859-1. Mac-Roman was introduced with the first Mac in 1984, so I don't think it could possibly be a descendent of ISO Latin. MacRoman changed one character from Mac-Roman (added the Euro). I'll update the confusing text in this article. —Michael Z. 2005-03-29 01:30 Z
-
- iirc they do however cover much the same characters which should probablly be mentioned and possiblly detailed somewhere.
[edit] lead section
the previous lead section was a one liner far shorter than Wikipedia:Guide_to_writing_better_articles#Lead_section reccomends. Futhermore it didn't even introduce two important variations (iso-8859-1 and windows-1252) which redirect here. I tried to expand it and was reverted by mjb (whose removals i have noe reverted back. mjb claimed it was redundant which is true but Wikipedia:Guide_to_writing_better_articles#Lead_section clearly states "If the article is long (more than one page), the remainder of the opening paragraph should summarize it." a summary is by definition redundant with the more detailed information in the rest of the article. Plugwash 6 July 2005 10:15 (UTC)
- But your "summary" was terrible. It introduced concepts and dove into technical details that are not required to achieve a basic understanding of the ISO/IEC 8859-1 standard. I'm not saying the article can't use another sentence in the intro, but if you re-read the article from the beginning, it sounds very sloppy when you immediately start talking about there being no control codes and certain code value ranges being reserved/unassigned — these topics were not even introduced yet and seem completely out of context at that point. I would also disagree with taking too strict an interpretation of the style guide; an intro paragraph does not need to summarize every topic that is mentioned in the article; if it can't introduce a topic without repeating or requiring one to read the whole article, then further simplification of the statements is advisable. — mjb 6 July 2005 11:12 (UTC)
ISO-8859-1 and windows-1252 redirect here and are not just misspellings so they need to be introduced in the summary. If we don't do so then we are misleading users into thinking there are the same thing as ISO 8859-1. If you can think of a way of doing so without mentioning technical details then go for it. Plugwash 6 July 2005 11:14 (UTC)
- Ah, see, that's the real issue; there are these redirects, and there is discussion in the article about these oft-confused character maps that are based on the standard. We can offer the reader this information without getting into any details that would require them to have already read the article. I've put one in, but perhaps it could be further improved. — mjb 6 July 2005 11:29 (UTC)
On another topic, do you have an opinion about "maintained by ISO and IEC"? I think it sounds awkward to say "ISO and IEC" rather than "the ISO and the IEC," but it seems equally awkward to put two "the"s in there. Is there a policy or style guide for using definite articles with organizations known by their initials? (The main point I was trying to make in the first sentence was that for a while, the standard was just "ISO 8859-1" and this is what everyone knows it as, but at some point the IEC became involved and any formal citation, especially an encyclopedia entry that has a responsibility not to perpetuate common errors, must say "ISO/IEC 8859-1".) — mjb 6 July 2005 11:34 (UTC)
[edit] Character table format
I've been seeing more and more 8-bit character table formats popping up in various articles. There are currently three different styles in use on this page alone. ASCII has two more, and Code page 437 has yet another. I think the template-based approach in the Code page 437 article is a good idea, but I'm not sure it's flexible enough to accommodate the kind of ad-hoc linking we have going on. Also, the auto-scaled 100% table widths are not ideal for all media. Other issues to consider are where to link each character to, and how much info to try to cram into each cell. We are discussing character linking over on Talk:Unicode. Questions to consider are below. — mjb 6 July 2005 12:02 (UTC)
- Should we standardize the 8-bit character code chart formats?
- What info should the charts contain?
- What should the charts look like? Are column/row headings important?
- Where should character entries link to? (see Talk:Unicode#Nifty_resource.)
- What's the ideal representation of things like space, soft hyphen, and control codes?
- What about difference highlighting? Keep?
- Can we achieve these goals with a template?
- my preffered way to handle character linking is to just let the character link to a page titled with itself and then redirect it to the most appropriate place. This allows all references to a character to be updated to point to the same place at once as well as allowing users to enter those characters through the search box and be taken straight to the appropriate place. Plugwash 6 July 2005 22:54 (UTC)
- Well, we could use templates, but I'm not sure the approach in Code page 437 is the best. There Template:chset-cell is used, which looks like this:
<span style="font-size: large; font-family: serif">&#x{{{1}}};</span><br /><small>{{{1}}}</small>
- Those hexdecimal character references are infact converted to UTF-8 by the Wikimedia software.
- Furthermore it uses Template:chset-tableformat, Template:chset-left, Template:chset-ctrl and all are put into a table by hand. We could do the same and better with something like Template:8-bit charset, which would look be used something like this:
{{8-bit charset|Name=ISO 8859-1|{{C0 control codes}}|{{ASCII character codes}}|7F|{{C1 control codes}}| A0|A1|A2|A3|A4|A5|A6|A7|A8|A9|AA|AB|AC|AD|AE|AF| B0|B1|B2|B3|B4|B5|B6|B7|B8|B9|BA|BB|BC|BD|BE|BF| C0|C1|C2|C3|C4|C5|C6|C7|C8|C9|CA|CB|CC|CD|CE|CF| D0|D1|D2|D3|D4|D5|D6|D7|D8|D9|DA|DB|DC|DD|DE|DF| E0|E1|E2|E3|E4|E5|E6|E7|E8|E9|EA|EB|EC|ED|EE|EF| F0|F1|F2|F3|F4|F5|F6|F7|F8|F9|FA|FB|FC|FD|FE|FF}}
- where the templates contain just the hexcodes, e.g. Template:C0 control codes (Control character#Tables):
00|01|02|03|04|05|06|07|08|09|0A|0B|0C|0D|0E|0F| 10|11|12|13|14|15|16|17|18|19|1A|1B|1C|1D|1E|1F
- Oh, I just realized that we would have to take special care of control codes (and a few others), because they do not work with links and display, maybe:
NUL|SOH|STX|ETX|EOT|ENQ|ACK|BEL|BS|HT|LF|VT|FF|CR|SO|SI| DLE|DC1|DC2|DC3|DC4|NAK|SYN|ETB|CAN|EM|SUB|ESC|FS|GS|RS|US
- I think giving alternatives with &124; does not work (well) in templates. Anyhow, the 8-bit charset template would then build a 16×16 table out of the 256+1 arguments it recieved, {{{Name}}} would be put into the caption (
|+
). How that table should look (hex, dec, oct and/or bin headers, U+ codes [probably by reusing Template:chset-cell]) is open to discussion, but all those codepage and charset tables would look the same. The then unnecessary chset-* templates should be deleted. Christoph Päper 7 July 2005 15:46 (UTC)
[edit] History of CP1252
I'm having a hard time finding what year Microsoft introduced code page 1252. I'm particularly interested in MS's support for the curved apostrophe and quotation marks ‘ ’ “ ”. The best I could find so far was "around 1986". — Hippietrail 01:41, 22 July 2005 (UTC)
- my guess is it came in with the windows concept of ansi code pages. I don't know how far back that dates though (p.s. i notice that whatever font is used for standard wikipedia text doesn't seem to differentiate between the opening and closing quotes but the font i see in the edit box does). Plugwash 02:09, 22 July 2005 (UTC)
- Minor point of interest is that the IANA did not accept Windows-1252 in its charset registry until early 2000, based on a proposal made in December 1999. The other Windows-125x code pages were accepted by the IANA in 1996 after being proposed by someone at Microsoft's Russian branch. — mjb 03:00, 23 December 2005 (UTC)
- And windows-874 still isn't in the IANAs list despite being actively used by at least outlook 2000, i pointed this out to the iana-charsets list but they didn't seem to care. Plugwash 08:46, 7 October 2006 (UTC)
[edit] Merge request
Someone tagged the article with a merge request. They apparently did not realize that this article forked off of the ISO/IEC 8859-1 article a while ago. Please present a case for the merge or the request will be removed. — mjb 03:00, 23 December 2005 (UTC)
- it was part of a mass split done a while back by a fairly new user that created a LOT of small ugly stubs. i've linked all the merge tags to a proposal at the main ISO 8859 talk page. Please comment there if you don't wan't me to go ahead with the mass re-merging. Plugwash 17:16, 12 January 2006 (UTC)
-
- These should NOT be remerged. They are (were) separate for a reason: ISO/IEC 8859-n is in no case identical to ISO-8859-n (when both exist, which is not always the case). These entries should be split again, with appropriate cross-references. Keka (who happens to have been involved with character set standarisation for many years), 2006-04-23.
-
-
- True in a sense but you could say the same about say jpeg and jfif. One is the formal standard left incomplete by standards body politics. The other is the equivilent real standard in use. Also in most cases the IANA defines ISO_8859-? the same as ISO-8859-? and an underscore is the standard substitute for a space where space can't be used. Plugwash 15:54, 26 April 2006 (UTC)
-
[edit] Related character maps- link to a disambiguation page
In the table in the section "Related character maps", at position (-4, 8-) in the table, the table links to the disambiguation page for index (It says "IND"). This shouldn't happen. However, I have abolutely no idea what kind of index it's referring to, so I didn't change it. Could somebody who knows more about this please change it to link to the specific type of index that it refers to? E946 04:59, 6 April 2006 (UTC)
[edit] The two pipe symbols
In the character chart, both the character | (value 7C) and the character ¦ (value A6) linked to the article about Pipe_(computing); but as far as I can see, that article only talks about the character | (7C).
I have changed both links to Vertical bar, which I believe gives more relevant information. --Oz1cz 14:57, 10 November 2006 (UTC)
[edit] Line Feed / Newline
Why is there no encoding for "line feed / new line" in this standard? How does that work? 83.118.38.37 09:06, 9 February 2007 (UTC)
[edit] ISO 8859-1 vs. UTF-8
I think someone of knowledge in this field should write a section with that name.
OK, the background: I installed a server software distro (Apache2Triad) and everything was working just fine and so I copied some folder with webpages in it I had on another server (XAMPP) and to my shock and appallment this thing was displaying letters with diacritics like all those crappy sites from 1990's, that I've noticed are not even capable of displaying apostrophes on pages in English. And I've noticed a coincidence of pages not being able to display apostrophes (nor any diacritics) and the page having a charset like ISO 123456 or something (instead of UTF-8) in its HEAD section.
And yeah, when I changed the line specifying ISO 8859-1 to UTF-8 (in the httpd.conf file) I got all my diacritics (namely, Latvian) and nothing appears to have been broken.
So, basically, why would anyone need a charset like this when there's UTF-8, what are the inclusion criteria for languages (lol), it's not that I would have any issues with a charset not displaying Latvian diacritics, but there are carons (š) in Czech, for instance, and macrons (ā) in Japanese rōmaji script, and those are like legit languages (not to mention the apostrophes) so, basically, the article seriously lacks some rationale section as to why would anyone use this encoding. 354d 22:05, 26 September 2007 (UTC)
I think the article ISO/IEC 8859 might answer your concerns! Theo 194.222.199.109 17:30, 30 September 2007 (UTC)