Talk:Universal Character Set

From Wikipedia, the free encyclopedia

Contents

[edit] Character Set vs. Character Encoding

We must be very clear in the distinction between a character set and character encoding. A character set defines a set of characters...um...how better to explain that: for instance, a character set could be the set containing the first four characters of the English alphabet -> {a,b,c,d}. An encoding is how the characters in a specific character set are actually stored as binary data. UTF-8 uses chunks of 8 bits to cover as much ground of the UCS as possible.

Anyway, my point is: the first sentence of this article previously equated UCS and character encodings. This is a really (relatively) grave error since it could confuse the bejeesus out of people. GodzillaWax 15:37, 20 September 2007 (UTC)

[edit] On a deleted sentence

At the end of the section on the differences between Unicode and ISO 10646, I had written the following sentence:

The Firefox browser and the OpenOffice.org suite can handle such characters, on Linux too, supporting Unicode and not just ISO 10646.

That sentence was deleted with the notice "Deleted a non pertinent sentence wich looked like advertisement." I beg to differ with that verdict.

I can concede that the sentence was not worded the best way. However, it is neither nonpertinent nor an advertisement. I wrote it to contrast applications which support Unicode (Mozilla and OpenOffice.org) with applications which support only ISO 10646 (Linux xterm). It is fully germane to this article and section, and any intention of advertising those applications was totally absent from my mind.

I will leave things as they currently are, but I wish it to be known that the charges are incorrect, and I hope some other user with the inclination for it would rewrite it in a wording that does not lend itself to those charges. --Shlomital 21:29, 2005 Feb 20 (UTC)

[edit] Article title change

Since Universal Character Set is a proper noun/proper name, as of today I have moved the article from Universal character set to Universal Character Set. — mjb 23:26, 20 Jun 2005 (UTC)

[edit] How about Chinese and Japanese?

Can you add a comment on Chinese and Japanese (and some other languaga, like hieroglyph) which can go not only horizontally bi-directional but also vertical down?

Thanks

[edit] Unicode and ISO 10646 distinctions and discussion of the character repetoire

I added a paragraph about the differences between Unicode and ISO 10646. I think the article could use more elaboration on these distinctions and to help drive home the particular innovations of Unicode.

I've also been working on a table that nicely summarizes the characters of the UCS (as of 5.0). My thinking is that this table colud serve as a departure point to link to other articles (or sections of this article) discussing the various scripts and other character blocks in more detail. Wikipedia already has individaul articles covering most of the scripts of UCS (the article could use a small discussion on the UCS use of the term script too). Also, the phonetic blocks could link to articles on the IPA and other relevant articles.

However, I've also been working on drafting portions to discuss the other character blocks: symbols; unified punctuation; unified diacritics, Unihan and CJK supporting characters; compatibility characters; control and formatting characters (such as glyph variant selectors, bidi characters, joiners, non-joiners and language tag characters), surrogates; and private use code points. Compatibility characters is especially a complicated topic that could use some eleaboration. The various symbol blocks are also vary specialized and some discussion of how they're used would be helpful. To me this is the type of information that a general audience would expect from an encylopedia artilce on the UCS (in addition to the topics already covered). It might also help more techincal readers as well. There are so many basic concept surrounding UCS and Unicode that seems to escape implementors of UCS and Unicode supporting text systems.

I'll likely post soemthing here to this duscssion page before posting it to the article. I'm still working on the formatting (I'm not that familiar with Wikimedia’s table markup, so it’s in plain old html table markup) Indexheavy 09:37, 19 April 2007 (UTC)

I now see that some of what I propose is handled in a separate article: Mapping of Unicode characters. Perhaps that article could be summarized in a section of this article. The summary table I'm preparing might fit better in that article. Indexheavy 15:10, 19 April 2007 (UTC)
I added the summary/categorized table of the UCS as I said I would. I added it to the mapping article. Anyone else is welcomed to jump in on these tasks. --Indexheavy 01:20, 25 April 2007 (UTC)

[edit] this has nothing to do with the content of the text

tried for a full five minutes to find an actual character map, to look up the Alt-code for the plus/minus sign. Couldn't link. Did get extensive, verbose, and redundant information on the history of, and subtle differences between the various UTF and ISO standards. Fascinating... but should we make these pages a QuikFix InfoBooth, or a "Jolly good read, wot?!". I'm not doing a project, I just needed a detail, and we should diversify into linked media to demonstrate the explanations and classifications given by the parent article.