Talk:List of HTML decimal character references

From Wikipedia, the free encyclopedia

Votes for deletion
This article was nominated for deletion on 29 July, 2005. The result was to keep. An archived record of this vote can be found here.

The article has been kept a second time following this VFD debate. Sjakkalle (Check!) 08:19, 23 August 2005 (UTC)

This article has been kept a third time following this AFD debate.
Please do not re-nominate it for deletion; Instead, discuss alternative methods of making the size of this page smaller here. Hedley 18:21, 21 December 2005 (UTC)

Contents

[edit] Announcement on Splitting this Article

Since this article has been up three times for deletion, and everytime it was kept. In accordance with Hedley (21 December 2005), I have decided to create several sub-articles, either with 2^9 or 2^10 character references (this is splitting by code points, not by actual numbers). Forbidden characters thus would reduce the number of characters per page. The article itself declares that it is limited to the first 2^14 references (out of about a million, that is more than 2^20), which would make either 16 or 32 pages (that is not much else as if it were a list split into 26 page for each letter in the alphabet).
Since this page has created a heated discussion in the second half of 2005, I'll announce this here, so you can scream out and try and stop me (using new arguments). I think I'll do it by the end of next week. JM.Beaubourg 17:03, 30 March 2006 (UTC)

[edit] Seperation because of technical problems

Yes, it seems that the wiki server chokes when it's supposed to render this article. The exact reason is unknown to me (the parser/renderer issues were supposed to be fixed with the version of the MediaWiki software...). Possible causes might include: bugs in the software and high server load. If server load is the cause, the problem may disappear "randomly".

In general, the page should remain seperated until the problem has been rooted out. After that, the article should be rolled back to the original one-page form (and page should be deleted, of course).

Sigh.

--Klaws 14:26, 26 January 2006 (UTC)

[edit] Seperation into different pages?

I believe that this can be divided by number. I'm thinking we can do 500 or so per page, that way, loading is much quicker for the Wikipedians who still have a slow Internet connection. Is that a good idea? MessedRocker 04:20, August 9, 2005 (UTC)

I'd prefer not. I'm going to be re-listing this on VfD in six months or so, and I'd rather not have to go hunting the pieces down. --Carnildo 06:24, 9 August 2005 (UTC)
I've already copied it to Wikisource, so do what you want. — BRIAN0918 • 2005-08-12 14:28
I say yes. I was looking through Longpages and found this article to be the biggest in size. I was able to fit 0032-1250 in a 31.7 KB plain-text file on my computer. Plus, it would be very easy to "hunt the pieces down", because they would be linked here on this page. -- RattleMan 07:44, 12 August 2005 (UTC)

[edit] Forbidden characters

Aside from my objections noted in the first VfD, there is yet another flaw with this list: HTML forbids the use of characters

  • 0000 ~ 0008
  • 0011 ~ 0012
  • 0014 ~ 0031
  • 0127
  • 0128 ~ 0159

These characters are not even allowed by reference. That is, you are not even allowed to write them as numeric character references. The fact that references to the latter group are commonly interpreted as the Windows-1252 characters that they are is entirely due to browsers being lenient in order to accommodate web pages that were sloppily produced with the help of Microsoft Windows applications. You should not be encouraging people to use € through Ÿ by publishing them in this list. — mjb 21:50, 13 August 2005 (UTC)

I've taken care of this. — mjb 16:27, 10 September 2005 (UTC)
As pointed out by an anonymous editor, 0012 is not allowed, either, even though it is said to be 'white space' in section 9.1 of the HTML 4.0 spec. Good catch! — mjb 04:57, 24 November 2005 (UTC)

[edit] Wikisource version of this list

The result of the first VfD was to transwiki, so the list was copied to Wikisource, at wikisource:List of HTML decimal character references. Meanwhile, we've been updating this one here on Wikipedia. So now we have two articles that are way out of sync.

I don't really understand how a list like this, living on Wikisource, is supposed to work. Is it proper to merge all the changes that have been made here into that list? — mjb 16:27, 10 September 2005 (UTC)

Wouldn't it be easier to edit the Wikisource version to redirect to this version? --Tony SidawayTalk 19:05, 16 September 2005 (UTC)
The Wikisource version has disappeared! What's going on? — mjb 17:56, 27 September 2005 (UTC)

[edit] People adding control characters and prose

I need to explain my recent reversions.

Anonymous users need to stop adding control characters. This is not a list of all UCS/Unicode code points; it is a list of valid HTML decimal character references. If a character is not valid in HTML, it must not be added to this list.

Also, I used to include a list of invalid characters and a detailed explanation in the 'Rendering' section. But because this article is a list, people saw fit to keep such text to a minimum and instead move that content to the separate article HTML decimal character rendering. So stop trying to pull it back into this list. If you think it belongs back here, then propose a merge, using Template:mergefrom, and explain your reasons here on the talk page.

Lastly, you were correct about one thing: form feed (0012) is indeed one of the invalid characters, according to HTML 4's SGML Declaration[1]. I missed that! Thanks. — mjb 04:03, 24 November 2005 (UTC)

[edit] I feel the invisible power of Wikipedia

Actually, I am the anonymous user. I have done something wrong and something right, as mentioned. It is good for other users to look carefully what I have done and to take appropriate actions. Moreover, the quick reponse surprised me, because virtuous users are much more than vicious users.

But I don't want to lose the typed special character, so I may create an article to store the typed data.

[edit] 3rd AFD

I didn't even realize this was up for deletion again until the AFD was already closed :) I'm starting to enjoy the delete rationale: "It's the largest article." Brilliant! I didn't know it was going to be the largest when I created it, but its size did spur me to create such monsters as List of places in Pennsylvania, which people finally hacked into separate pages. Oh well, you can't win 'em all... — 0918BRIAN • 2005-12-21 20:45

[edit] Hexadecimal or Decimal Codepoint?

I've found that the (left-facing and right-facing) swastika sign 卍(&21325;) and 卐(&21328;) is errorly added in the secion 5120~5375, since their code point in hexadecimal is 0x534D and 0x5350 respectively. I fixed this problem, but should there be some note that those codepoint numbers are decimal, not hexadecimal (to prevent more problem like this)? Or is this kind of message already in the article? --LPH 09:56, 4 September 2006 (UTC)

P.S. Later I traced back in Edit History and found that the author who made this mistake also made the same mistake at 5341 (with the wrong character '十' which means 'ten' in chinese). I fixed that too. --LPH 12:24, 4 September 2006 (UTC)

[edit] Why this title?

It seems a bit odd to create a document that lists UCS / Unicode code points and call it “List of HTML decimal character references”. I've tried to look for this article several times and I always have trouble finding it, even though I know it's here. I understand the article purposefully leaves out characters that are forbidden in HTML, but in the limit this is basically a list of UCS/Unicode characters (in their decimal representation). So these are also the character code points for many implementations (with a few minor differences.) So with that in mind, would it not be simpler for Wikipedia to simply maintain a list of UCS code points and then have separate articles that talked about different implementation’s use of those code points (SGML, XML, HTML, etc)?

One would think, yeah, and that argument, in various forms, sums up the position of those who voted to delete the article (myself included), but it remains in place because too many people find it "useful". And perhaps because the person who created it is an administrator. —mjb 23:56, 11 October 2006 (UTC)
I also agree the article is useful (especially until the Unicode site gets the official listing online). However, it would be more useful if readers knew where to find it. I don't think very many readers think of looking for “character” related articles under HTML: even if they’re looking for codepoints for use in HTML. --Cplot 05:55, 12 October 2006 (UTC)

[edit] THIS ARTICLE SHOULD BE DELETED, PERIOD!!!

This article is so, so wrong ... and still Im astonished how it survived 3 deletion votes. For that reason I will make a list here of reasons to delete it (see in the end of this section), so we can discuss it better, and see if some good sense fall on you, fellow wikipedian. Really, it is one of the most awful article I've seen here. Something should be done with it SSPecter talk 02:00, 12 December 2006 (UTC).

[edit] Reasons to Delete or Move

  • This is a UNICODE subject, not HTML!!!
  • It is a reference article (to go to wikisource), not enciclopedian. Usefulness dont excuse it. (it dont explain what a term is, it list codes accepted by a technology. It is the same as making a documentation for Microsoft Word, or Windows API specification)
  • THERE IS A INFINITILY MUCH BETTER LIST OF UNICODE CHARACTERS IN WIKIBOOKS. (which was previously in wikisource)
  • This article is about a specific implementation of a pattern (UNICODE in HTML).
  • This article can influence wiki users to do other "specific implementation" articles (Ex.: "List of UNICODE characters accepted in PHP", "List of characters in Java", "List of characters in Linux" )
  • It is awfully broken in several articles
  • It is unsalvagedly incomplete (as very few wikipedians are crazy enough to finish it).

[edit] Reasons not to delete (counter arguments for the list above)