Wikipedia talk:Article size

From Wikipedia, the free encyclopedia

1 | 2

Contents

[edit] Archiving

I've re-archived this talk page because 1) the old page included a dead/inaccurate link to the first talk page archive, and 2) part of the talk page archive was moved to another article page, which is at MfD. Also, an unsigned quote had been added to the top of the page, which is now archived. Here is the talk page version before I archived. Sandy (Talk) 10:46, 21 December 2006 (UTC)

[edit] Current status?

It's unclear whether there is still any discussion on this topic at all, or where to contribute if there is. The archived pages suggest severe disagreement with suggested 'enforcement' of guidelines, with apparent consensus to "delete" the entire matter.

My interest has to do with the 'warning' that we receive when working on the Contract_bridge_glossary, that piece being about 153 KB in its socks, as of today. The glossary is a key feature of the Contract bridge Project, and a natural starting place for someone looking into the subject, or simply looking for a reference source. It also operates, to a degree, as a table of contents to the project's work. And it relies heavily on internal linking, by (if I may) definition.

Over the past week I've re-written the in-page links of the glossary, so as to link directly to the terms in question, rather than to alphabetical section heads (e.g., 'ruff' now links to 'Ruff' rather than to 'R'). I've also indicated which links (bold-faced) lead to new pages, whole articles devoted to specific terms and topics, as a notice to dialup viewers.

In practice, this makes the page 'smaller', easier for viewers to work with by far.

As for the subject here—size of pages, especially with regard to browser and internet connection limitations—breaking up the glossary would in practice make the page 'larger', since proportionally more of the links would require loading separate pages, and these when loaded would interact more slowly than by this method.

The question of size here, I think, should focus on the writing, its clarity and conciseness, with clear links to more expanded discussions. If well done, it can result in a form of 'browsing' that loses track of time, or makes the most use of it, as preferred. FutharkRed 05:08, 6 January 2007 (UTC)

It would be helpful if you would provide a link to the article you're discussing. SandyGeorgia (Talk) 05:21, 6 January 2007 (UTC)
Sorry, I do get carried away. Obviously, I should "focus on the writing, its clarity and conciseness", as per my own advice! And obviously, after a while, people lose track of where I started, which was with the Contract_bridge_glossary article way back in the second paragraph. Should I just mention it, as above, or actually link to it, as Contract bridge glossary?
As a comparable matter, The Bridge World online glossary, much more extensive (so far), is split into 24 separate alphabetical pages, with no links at all between definitions, which seriously limits its usability/usefulness/usage--all three. They are considering a change. FutharkRed 09:39, 12 January 2007 (UTC)
A glossary is not intended to be read from start to end. Thus there are no stylistic/readability concerns for the above example. My only concern would be download time for those with slow modems and cell phone browsers. --mav 23:24, 2 February 2007 (UTC)

[edit] 32KB page size limitations

The article reads that by June 2006, Firefox is the only commonly used browser which cannot handle the bug with pages with more than 32KB of content. At the moment, I don't seem to have such a problem with Firefox (version 2.0) so I think that statement can be removed. Please reconfirm me about this prior to editing the text. hujiTALK 14:26, 29 January 2007 (UTC)

According to the fact that the article itself ruled out the mid-2006 bug with Firefox with Google Toolbar, I removed the statement in question. hujiTALK 18:28, 29 January 2007 (UTC)


[edit] "Attention Span"

Unless someone has good evidence that the "attention span" of the "average adult" has an "upper limit" of 20 minutes, that section needs to go. I added [citation needed] quite some time ago, but no further evidence has been forthcoming. In fact, the "attention span" of even the same individual (much less the entire population) varies widely, depending on that person's interest in the subject (as well as external factors, such as fatigue). If an article is interesting and well-presented, it could hold your attention for hours. If not, your attention span could be only a few seconds. Trying to come up with a single metric for "attention span", given the wide variance in Wikipedia users, topics, and article quality, is probably not a good way to proceed.

130.126.165.236 22:54, 19 February 2007 (UTC)

Good work, I thought the exact same thing when I read it sounded like amateur psychology or some kind of new theory someone was trying out here on WIkipedia. Quadzilla99 06:51, 27 February 2007 (UTC)
Okay, it's gone. If someone actually comes up with a credible cite, put it back -- but I don't think that's going to happen. —The preceding unsigned comment was added by 130.126.165.236 (talk) 19:21, 27 February 2007 (UTC).
Glad it's gone, never liked it. —Doug Bell talk 19:29, 27 February 2007 (UTC)
Doesn't matter if a cite comes up, honestly. This isn't an article, and attention span is not the best reason to keep articles down to a reasonable size. Mangojuicetalk 20:05, 27 February 2007 (UTC)

A cite is easy to find if you just search for it. Search google for "lecture attention span" and you will find articles like this that describe studies of attention patterns during lecture. Whether wiki articles are similar enough to lecture for the results to carry over is disputable, and whether attention span is relevant to articles is disputable, but there is a kernel of honest research available if you look for it. CMummert · talk 20:09, 27 February 2007 (UTC)

If you can find a study done with written material on subjects who were actively seeking out the material (rather than being passively subjected to didactic lecture material chosen by someone else) please put it in. I will note that the Lord of the Rings movies seemed to have no trouble maintaining audience engagement, despite being 3 hours+ in length. Besides, there's no law that says you have to read an entire article in one sitting. That's why we (in theory) have sections, right? Wikipedia is a reference source, not a novel. 130.126.165.236 21:59, 27 February 2007 (UTC)
I would think that that part in the guideline where it says
"Readers may tire of reading a page much longer than about 6,000 to 10,000 words, which roughly corresponds to 30 to 50 KB of readable prose."
should be removed for the same reason. The part that singles out science articles is also unjustified - wouldn't complicated philosophy, or arcane history, be equally difficult to read? CMummert · talk 23:18, 27 February 2007 (UTC)
I edited that section to remove the specific reference to science articles. I thought that the old wording was choppy and not in a logical order, so I reordered the section and divided the material into paragraphs more logically. I believe that there is no change in the content or intention of the new wording, just a reordering to make the guideline easier to read. CMummert · talk 15:43, 28 February 2007 (UTC)

[edit] Biographies - exception to the rule?

I've noticed several biographies that are 100k

I don't really see how one could justify splitting up a biography into smaller portions, as long as the article is all biography. Some people have interesting and varied lives.

If one accepts the arguments above that "attention span" is not really applicable to the wikipedia (and I think the arguments are well made), and that dial-up, as a reason for short articles, is becoming less of a concern, might the wikipedia not at least make an exception to the rule for well-written biographies?

Andysoh 00:16, 4 March 2007 (UTC)

This isn't really a hard and fast rule. It's just a guideline which is probably a good idea, but often isn't. If a long article is good for a topic, then it's good. --Gwern (contribs) 07:11 4 March 2007 (GMT)
Thanks Gwern! When I read the guidelines I found them just a little worrying, hence my slightly defensive comment here... Perhaps someone should add "If a long article is good for a topic, then it's good" into the guidelines for a bit of positive support, or, perhaps as I suggest, expand the 'occasional exceptions' to the guidelines (e.g. add biographies) with any necessary conditions you think important, in some way to make the contributer less worried?

Not to worry, just a thought. Andysoh 21:47, 4 March 2007 (UTC)

[edit] How do you find the readable page size?

This page contains the advice: "To quickly estimate readable prose size, click on the printable version of the page, select all, copy, paste into an edit window, delete remaining items not counted in readable prose, and hit preview to see the page size warning."

But where? Where is the page size warning?

qp10qp 05:13, 8 March 2007 (UTC)

The page size warning will be above the edit window, but only if the raw text exceeds 30k. See User:Dr pda/prosesize.js for a javascript tool. Instructions are on the talk page. Gimmetrow 00:40, 26 March 2007 (UTC)
I followed the instructions found in reference 2, but no page length warning appears on the preview page. Perhaps some Wiki code needs to be not deleted for it to function? I had to copy and paste it to an email draft to get the Kb size. The article for psychokinesis gives a erroneous warning for 54Kb, but the body of the article (as I write this) is actually only 13 Kb long. So this warning system is not perfect. 5Q5 14:26, 19 May 2007 (UTC)

I took a look at the article psychokinesis, and copied the source code to a text file on my computer. The number of bytes was 56,262. Seems correct to me. --Mr. PIM 02:31, 20 May 2007 (UTC)

The source code? Try following the instructions given below on this Project page, which is what I flagged as not working. The source code is full of just that: code. Explain why the body of the article quantum physics (41Kb) is at least three times longer than psychokinesis (54Kb)! the Quantum physics article has a small References section, while the one at PK is very lengthy. I think Wiki is miscalculating the extra material in the References on the PK page because it is just outside the {{ }} template tags but still inside the <ref> </ref> tags. The instructions below refer to pasting the material from the Printable version into the Wiki editor and then hitting Preview. When I do that, all I get is the Preview warning, but no length notice above it. Something in these instructions need to be tweaked. Anybody else experiencing this? 5Q5 19:05, 21 May 2007 (UTC)
The MediaWiki message is based on the size of the code, not on the size of the readable prose. —Centrxtalk • 00:12, 22 May 2007 (UTC)
Doesn't anyone see the contradiction? The page size warning that appears during a full page edit click refers to the entire source code. But the Project article here says that only the readable prose should be considered for when it's time to split up pages. People can erroneously believe they should begin splitting up an article when the readable prose; i.e., the main body of the article, is not that long because they are considering the source code total, most of which will not print out. This is what is happening at the psychokinesis article, of which I wrote over 95%. It is giving out a consider breaking up warning, even though once all the references and other material are excluded, the readable prose is moderate in size. I just want to point this contradiction out. Being the primary author of the PK page, I will have to deal with people chopping it up when it's far from time considering the length of other articles. Thanks. 5Q5 13:32, 22 May 2007 (UTC)
That's not a contradiction, that's an ambiguity. Many articles exceed the size at which the too-long message is shown, and it is generally understood that is okay. If you want, the size at which the message is shown can be increased. —Centrxtalk • 02:35, 26 May 2007 (UTC)

[edit] Who reads a 20-page article?

I've been looking at the approximate guidelines and the more I think about them, the more they seem excessively tolerant. In the light of many recent FACs with far more than 60k of prose, even about very narrow topics, it seems slightly absurd that 100k of prose in a single article should be tolerated, no matter the subject. When and why is it reasonable to provide the general readership with 20+ pages of text? I mean, most of all that material is pure nerd padding. It's intended for readers who are already interested in the topic. So what's the justification of forcing everyone to read this nitty-gritty? Why aren't we demanding of those we already know to be patient with the topic to click an extra link or two?

Mind you, these are just the reader issues. There are also really very real editorial problems in managing 60k+ of prose: more edits in certain article histories, greater difficulty in coordinating different sections of an article, greater risk for illicit or false additions to pass undetected for long periods of time, etc.

Peter Isotalo 16:10, 2 April 2007 (UTC)

Pesonally, I don't think there's any "forcing" involved, and I don't really agree with Peter at all - I would suggest it's wrong to say that large articles are mostly "pure nerd padding".
If I'm searching on a subject, a wikipedia article is usually well indexed and I can go directly to the exact part I want. I don't want to be sent round lots of artifically split up sections on different linked pages.
I bet you find Wikipedia's prominence is at least in part because it is not a few lines of milch, but real substance.
Personally I think the restrictions should continue to be gradually relaxed, perhaps plus 20k or so on each suggested level.
Andysoh 19:47, 2 April 2007 (UTC)
"A few lines of milch" is the encyclopedic core of a topic, and in commercial encyclopedias years of experience and a wide range general knowledge lies behind the selection of that small kernel of encyclopedic relevance. It also happens to be what most people read and in most cases all that is required. Now, the selection of what to include in the summary can be a risky affair, but it's also so much more satisfying for those who don't have the time to read a monster article that includes too much detail. Most of our most devoted editors are nerds of one kind or another, and I'm certainly no exception. I don't see any point in trying to deny this. And those who generally want to have 20-page articles are those with specific knowledge of topics, not the majority of occasional readers who are just look for a quick insight into the subject.
So, again, why are we making guidelines that are almost guaranteed to encourage articles that cater more to the needs and demands of a minority already blessed either with previous knowledge or patience to read a lot than the majority that doesn't have the time for all those details? And I didn't see a single comments as to some of our biggest and most difficult problems: lack of flow in texts and article cohesion. These problems will only get worst if we tolerate longer and longer articles. Not to mention that the really long articles are usually long only because the editors interested in them are extremely over-represented, now that they're actually more important than other topics. The American Civil War, for example, is not more important than the Middle Ages, even though the main article is almost three times as long.
Peter Isotalo 13:56, 3 April 2007 (UTC)
Hi Peter - OK, no disrespect to those who can summarise a complex topic in a few lines. I'm in favour of summaries of one, two or three paragraphs or more. But I think the wikipedia is beginning to stand out above the standard commercial encyclopedias because it is getting more detailed, e.g., more useful to more people.
I think the opening section of a larger article (that sits under the title alongside the contents listing) should be a very breif summary, covering the content of the article, as far as possible in the order it occurs in the article. This might cause some bigger articles to grow where this is not done, but I think it is good practice.
I think that wiki lore should say that if an article is big (say 30k plus), the opening section should be a brief summary. And it should say it here, for instance.
Perhaps if its 100K plus, wiki lore should recommend that editors consider making the second section, below the content list should be a summary or overview of a size fit for purpose.
Can you assume that a short summary is what most people want? Do you have any surveys which show this?
Such a survey would have to distinguish between people who are just passing through, and people who really value what they get from the article. These people, who really rate the article highly, are likely to have got just that detail from a lengthy article they wanted, that was not previously easily obtained.
I don't think you can prejudge what the reader wants, and I'm sure it doesn't follow that only a minority want a more detailed appraisal of the subject, or that the only people who want or who read the longer stuff are people who already know the longer stuff.
I very much appreciate that the wiki articles I looked up today, (Isaac Newton 48k, Karl Popper 34k) were suficiently detailed to provide the information I wanted, and more accessible, better indexed, with sub heads, etc., as well as linked subjects. There were no text flow problems or lack of cohesion, that you cite. I didn't read the whole article, I was able to easily go to exactly the place I wanted, and found the information I wanted, very well summarised. I haven't seen them, but I think rather than compare the Middle Ages article and the American civil war article as to which one might think is more important, ask whether the article is well written, well presented, the material is relevent, well referenced etc.
I personally would say to the Newton article editors - its good, write more. Andysoh 19:09, 3 April 2007 (UTC)

When and why is it reasonable to provide the general readership with 20+ pages of text?

When the subject demands more than that? Are you seriously arguing that every single human endeavor can be covered adequately in 20 pages? And why on earth would you want to discourage editors and readers with "specific knowledge of topics" from sharing or expanding that knowledge? If you believe that some topics are "underrepresented" or "more important" than topics with longer articles, feel free to expand them.

This isn't a "commercial encyclopedia", and doesn't suffer from many of the limitations of such (e.g., printing costs and physical size).

P.S. the Brittanica article on the American Civil War is 31 pages long, compared to 21 pages for the Wikipedia article. So, it appears that the article isn't excessively long even by the standards of "commercial encyclopedias".


130.126.165.236 20:59, 9 April 2007 (UTC)

[edit] Problems editing a long article

I believe the section Problems editing a long article needs some revision, as the problem is not solely limited to older browsers. I have the very latest, top-of-the-line BlackBerry 8700 and find that it truncates long articles over 32k when doing an edit. The suggestion to avoid the problem by doing section edits is fine, but that solution doesn't apply for an Infobox or lead paragraph edit, nor for vandalism reverts. JGHowes talk - 02:12, 20 April 2007 (UTC)

Add "non-standard browsers" or "browsers on embedded devices" or something. It's not a problem for PCs. —Centrxtalk • 03:07, 25 April 2007 (UTC)

[edit] Probably off-topic...

...but I'll ask it anyway: how can you determine the size of an article without using an external application? --Howard the Duck 03:02, 4 May 2007 (UTC)

If an article is greater than 32KB a message appears above the edit window if you edit the page. If the page is less than that size, you need to copy and paste to an external program. —Centrxtalk • 20:40, 10 May 2007 (UTC)

[edit] Page size messages

Someone has changed the messages we get when we edit so that it no longer gives the page size. It was very, very useful to know the size. Does anyone know who changed it and why, or where it was discussed? SlimVirgin (talk) 02:05, 29 June 2007 (UTC)

[1]. Rather silly. —Centrxtalk • 23:37, 5 July 2007 (UTC)

[edit] Manageability

It seems to me that there's a correlation between the degree of jumble and disorganization in WP articles and the length. So it seems that another reason to keep pages short is to facilitate effective communal editing.Ccrrccrr 02:38, 26 July 2007 (UTC)

I personally have not seen any evidence of this. Andysoh 18:29, 26 July 2007 (UTC)

[edit] Suggest less rigid byte counts

At present, when editing FA status article 'Belgium', one sees "This page is 108 kilobytes long. It may be appropriate to split ...". But let's have a look at the origins of these > 100 KB which according to the guideline "Almost certainly should be divided up":

country infobox: 4,440 chars.
section titles,
  'main' & 'see also' immediately underneath those titles,
  image links,
  and actual article text:
35,322 chars.
'see also' section 602 chars.
indexed footnote references (mainly for WP:V): 32,957 chars.
three roughly equally sized (sub)sections:
  'general online references',
  'bibliography',
  'external links'                     (together):
8,892 chars.
bottom page v.d.e.-boxes,
  categories,
  and FA status template:
892 chars.
links to the article on other language WPs: 2,089 chars.


total: 85,194 chars.

This character count total of 85 KB does not match the reported 108 KB well (probably having a technical cause like spaces and/or carriage returns not being counted as characters by my editor program, I did not look into it), but I think the general picture is clear: in particlular the exceptionally large footnotes section (more than 90 differently numbered indexes to well over a hundred linked proper references). Since many index numbers are repeated within the article, and each [??] index takes about 4 bytes, the main part of the article is in fact only about 34.5 KB and the footnotes take 33.8 KB. Whereas other references and a 'See also' section can be assumed to serve as "further reading" and thus a normal part of the article, the indexed footnotes for WP:Verifiability and the links to WPs in other languages, together about 36 KB of the 85 KB total, should not be counted as length of an article (unless if purely wikitechnical problems would be caused). I think the project page should explicitly mention such (possibly also for other elements such as the collapsable v.d.e. boxes at the bottom of an article, which usually occur on articles that belong to a much standardized category like here 'countries'). At an earlier FA review, the size had not been considered a problem precisely because of that extremely lengthy footnotes section. — SomeHuman 31 Jul 2007 00:35 (UTC)

The recommendations listed on the page here are supposed to be for "readable prose", that is excluding formatting and perhaps footnotes. The software message given when editing a page is only good enough to have a simple bytecount, and oftentimes pages that do have such a large bytecount do warrant splitting up. —Centrxtalk • 03:48, 11 August 2007 (UTC)

[edit] Article size not indicative of total bytes downloaded

This user's page gives my computer a lot of trouble even though the page is reportedly only 3,365 bytes. In fact, nearly all of that is code for templates which themselves include more templates and images and scripts totaling untold amounts of data. In this particular case, there are hundreds of images associated with that page. --Mud4t 00:33, 15 September 2007 (UTC)

[edit] Calculation of article size

I followed the instructions in your footnotes and it doesn't seem to work. I couldn't see a thing about article size anywhere. Has this process changed? Fainites barley 18:12, 15 November 2007 (UTC)

[edit] Proposed Splitting

I'm gonna suggest that List of placental mammals be split into lists based on each order, with the main page retained as a pointer to all the sub articles. Is there a tempalte to formally propose this? Mbisanz (talk) 07:19, 21 November 2007 (UTC)

[edit] Shortcuts removed

The shortcuts WP:SIZE, WP:LENGTH, & WP:LIMIT show up as redlinks when they worked yesterday. Any idea what happened? Thanks. -Fnlayson (talk) 22:42, 12 December 2007 (UTC)

[edit] Upper limit

I know the Guidline proposes an upper limit of 100Kb and mentions that pages over 400Kb might have trouble being displayed and should have arbitrary breaks. It might be nice if this was codified more into a rule like "No pages over xxxKb". Some pages like Line of succession to the British Throne and List of Statutory Instruments of the United Kingdom, 1991 are almost certainly in need of an arbitrary break. Mbisanz (talk) 03:14, 13 December 2007 (UTC)

[edit] Warning against splitting

Should we warn users against splitting a long article if there's a chance the split article will be deleted? When an article is in one piece, the overall article is governed by WP:N, but the contents are governed by WP:NNC. That means things like Lists of Characters, Episodes, etc. are generally safe if they're part of a whole article that is sourced and notable, but are likely to be AfD'd if they're separated out. I've argued that split articles should be judged for WP:V and WP:N as a whole, but it doesn't look like that's going to happen anytime soon. Torc2 (talk) 08:05, 20 December 2007 (UTC)

Oh good, I pulled up this discussion page just to see of there was mention of this concern. Perhaps it is discussed elsewhere (if so, someone please be so kind as to link to it here), but there does seem to be a conflict of splitting due to size concerns and concerns of notability (particularly in cases of WP:FICT). Since there is no way to tightly link pages with eachother, content can be removed simply because those watching only the main page aren't automatically informed of an AFD for a forked page so they might provide argument for the page's existence. My point is, Some mention of WP:N should be made in the section explaining how/when to split an article. I'm quite confused on WP policy/guideline/consensus on the subject. -Verdatum (talk) 07:11, 27 December 2007 (UTC)
Please continue this discussion at [[2]]UnitedStatesian (talk) 20:28, 27 December 2007 (UTC)

[edit] Central point

The central point of this article needs some work. It currently states

Readers may tire of reading a page much longer than about 6,000 to 10,000 words, which roughly corresponds to 30 to 50 KB of readable prose. If an article is significantly longer than that, it may benefit the reader to move some sections to other articles and replace them with summaries (see Wikipedia:Summary style). One rule of thumb is to begin to split an article into smaller articles after the readable prose reaches 10 pages when printed.

The difference between 30 and 50 is large. And there is no guideline as to what significant means. If no one objects I'll try a rewrite of this section in a couple of days. Mccready (talk) 06:23, 25 January 2008 (UTC)

The 30 to 50 range works fine for all purposes I know of; what changes did you have in mind? I think the page is fine. SandyGeorgia (Talk) 06:34, 25 January 2008 (UTC)
OK yes 30 to 50 is ok but why not limit it to 50 except for the type of cases mentioned. This would be clearer and avoid the need for the ill-defined "significantly".Mccready (talk) 12:03, 25 January 2008 (UTC)
Because WP:IAR always comes into play. There are scores of featured articles that have significantly more than 50KB readable prose. User:Dr pda/Featured article statistics I've opposed many of them, but they always pass. SandyGeorgia (Talk) 16:08, 25 January 2008 (UTC)
I agree, this is a case where setting an explicit guideline detracts from the intent of a good article. It can be decided on a case-by-case basis if a split is warranted or appropriate for a range of article sizes. -Verdatum (talk) 17:50, 25 January 2008 (UTC)
Agree entirely with SandyGeorgia and Verdatum. WP:NOTPAPER certainly applies. Additionally, I smell some WP:GAME in the air with Mccready (talk · contribs · logs · block log): he has a history of trying to use the article length guideline as an excuse for selective, biased deletions from acupuncture. Most recent example: 25 January 2008: "bold rewrite of head. article is 86kb long and this stuff is covered below.". From March '06:removes large chunks based on V RS's, argues OK because "article is too long". Some things never change. --Jim Butler(talk) 00:08, 27 January 2008 (UTC)

Slander from acupuncture Jim. And don't flatter yourself, I couldn't care less whether you resign from wikipedia then come back a couple of days latter. Whether it's you or someone else grasping for straws to defend an emotional irrational commitment to whacky altmed makes no difference to me. "Ignore all rules" is obviously tongue in cheek, not an excuse for an article in the order of 100kb. Mccready (talk) 08:18, 28 January 2008 (UTC)

[edit] NPOV dispute

What if there's a dispute about how to treat a topic neutrally?

  • (current) It may also violate the neutral point of view policy to create a new article specifically to contain information that consensus has rejected from the main article.

In some cases, "editorial consensus" has suppressed one side of a controversy. This, itself, is a violation of NPOV. It can happen when a preponderance of contributors to a controversial article are supporters of one of the sides in the controversy. By violating WP:OWN they can create a "tag team" which operates to eliminate any positive mention of the opposing POV.

I think the above sentence should read:

  • (proposed) It can help defuse an NPOV conflict by creating a new article specifically to contain information which is too controversial for the main article.

Note that the arbcom has specifically ruled that "removing well-referenced information" on the grounds that it "advances a POV" is against policy [3]

Some people disagree with that. They disagree with the idea that Wikipedia should include "all significant points of view regarding any subject on which there is division of opinion."

But it would be odd indeed to enshrine this objection to policy in a guideline! --Uncle Ed (talk) 13:23, 16 February 2008 (UTC)

I think this topic should be wordsmithed elsewhere, not in this relatively obscure guideline. SandyGeorgia (Talk) 16:18, 16 February 2008 (UTC)

[edit] Proposed change and factual correction: Large articles DO confused modern browsers

I propose the following changes:

From:

The best permanent solution is to simply upgrade to a more modern web browser, if possible. No major modern web browsers have this problem on their recent versions...

to

The best permanent solution is to simply upgrade to a more modern web browser, if possible. Except for the very longest articles such as older versions of Wikipedia:List of missing journals, which once weighed in at over 400,000 bytes, no major modern web browsers have this problem on their recent versions...

and From:

If you find a section too long to edit correctly and safely, you can post a request for assistance on the Village Pump. Follow the "post" link for the assistance section, which will allow you to post a new comment without editing any existing text. Answering your request may take from an hour to a week, depending on the response of your fellow volunteer editors.

To:

If you find a section too long to edit correctly and safely, you can post a request for assistance on the Village Pump. Follow the "post" link for the assistance section, which will allow you to post a new comment without editing any existing text. Answering your request may take from an hour to a week, depending on the response of your fellow volunteer editors. For extremely long articles that cannot be edited or even viewed in all modern web browsers, you can try splitting the article, as was done with Wikipedia:List of missing journals. You may need to turn off certain features in your web browser, use a different modern web browser, use an older web browser, or using a computer with more memory may allow you to edit and split the article. Before splitting articles discuss it on the article's talk page.

Background: Last year I split Wikipedia:List of missing journals for purely technical reasons. I wound up using a 4.x version of NetScape to make the initial split then I was able to use a modern browser to clean it up. Earlier today I verified that my very-modern web browser still chokes on the old long versions.

The above changes are meant to be consistent with Wikipedia:Article size#Technical issues.

Comments? Objections? davidwr/(talk)/(contribs)/(e-mail) 17:11, 3 March 2008 (UTC) updated davidwr/(talk)/(contribs)/(e-mail) 17:15, 3 March 2008 (UTC)

What about mobile browsers? —Torc. (Talk.) 19:04, 3 March 2008 (UTC)
Got numbers? If any major browser hangs on articles above a certain size, we should consider a "hard" cap at that size or below. If it's just inconvenient then it should be a style decision but not a make-or-break issue. By the way, large-number-of-pixels pictures are probably a bigger problem for users looking at tiny cellphone screens than an article that's 350KB long. I expect that the "List of Missing Journals" article would be rendered as well as a typical 30KB article in a mobile browser. davidwr/(talk)/(contribs)/(e-mail) 23:33, 3 March 2008 (UTC)
No, actually it was a Real Question™ - I have no idea how much data mobile browsers can handle on a single page. I only have an old Sidekick II, and I doubt anybody would use that for a yardstick on how much the average mobile browser can handle. Mainly I just thought I'd bring up the topic because I don't see it mentioned, and I know plenty of people who use their mobiles for looking up answers here. —Torc. (Talk.) 00:18, 4 March 2008 (UTC)
That's really several questions in one. How much data can they hold in a single web page, how much html can they handle on a single web page, and how big a table or whatever other constructs wikipedia uses can they handle? I'm not sure why 400KB of WikiCode breaks modern browsers, but most modern browsers CAN handle simple web pages in the multi-megabyte range. —Preceding unsigned comment added by Davidwr (talkcontribs) 15:38, 4 March 2008 (UTC)

[edit] True length of article text?

The guideline states we can look to MediaWiki:longpagewarning for the size of a long article. From observation, it measures something other than the raw bytes of data the text constitutes (compare its warning to a version of the article's script saved as a text file). However I have noticed it considers citation templates as raw text. Heavily referenced articles will be rated longer than they are Ninja Gaiden (2004 video game) yields a 73k warning; but if we removed all references (i.e. <ref>[...]</ref>), the unreferenced article yields a 40kb warning. It is not the displayed reference list which contributes to the size (removing the {{reflist|3}} without removing all references still yields 73k). Since the guideline is supposed to be for "readable prose", is there a way to get the script performing the longpagewarning to exclude all references in its calculation? Jappalang (talk) 01:45, 10 March 2008 (UTC)

The script also considers commented out statements (<!-- ... -->) as part of the text. Jappalang (talk) 13:59, 15 March 2008 (UTC)
The guideline advising "use a modern browser" and only "readable prose" matters is flat out wrong. There are hundreds of millions of English speaking web users and many are still on dial-up and are using older computers. The most important count actually is just a byte count of the text in the edit window when you hit the "edit this page" tab. I couldn't figure out why the article Barack Obama was hanging up my web browser (and I am using the latest update to Firefox by the way), until I looked at the source (I can't edit it anyway because it is protected), and found that it was 141 kB - which you can see from the byte count in the history[4]. The article has recently been trimmed about 13 kB, and loads a little bit quicker, but still hangs the browser for a looong time. I would like to see a hard limit of 64 kB in place of the hard limit of 32 kB. Over half of the featured articles are less than 32 kB. There is no reason for just eliminating the limit completely, as if "well theoretically someone might be able to edit it". 199.125.109.28 (talk) 16:35, 20 March 2008 (UTC)
By the way, talk pages can be twice as long - up to 128 kB. For example the much longer Talk:Barack Obama does not hang up the browser. Probably because of the lack of images. The full byte count transferred to the browser for the article is something like 484 k[5] 199.125.109.28 (talk) 16:45, 20 March 2008 (UTC)
In the "Technical issues" section, is it possible that "over 400 kB" refers to total bytes transferred, not text bytes? 199.125.109.28 (talk) 17:07, 20 March 2008 (UTC)
I was the one who added (diff) the 400KB info last year. The 400KB refers to the size reported by Wikipedia when you edit an article. This came about because editing List of missing journals (hist) other than one section at a time was nearly impossible before I split it. davidwr/(talk)/(contribs)/(e-mail) 18:36, 20 March 2008 (UTC)
That's a text only article. Have you done any tests to see if it refers to total byte count or not? Barack Obama is 480kB with the photos, and locks up Firefox, which seems oddly suspicious considering that it could be only because it is over 400 k. Seriously, with the attention span people have on the internet, do you really expect anyone to actually read any article that is over 10k? Wikipedia really needs to go back to a more reasonable guideline. 199.125.109.28 (talk) 22:40, 20 March 2008 (UTC)
Wikipedia needs at least two guidelines: A guideline for people, a hard and fast rule for typical desktop computers with modern browsers, and possibly another guideline for cell phones, older browsers, and other special cases. I didn't have any problem editing and previewing changes to Barack Obama using a modern Firefox. Editing Barack Obama gets a warning This page is 126 kilobytes long. It may be appropriate to split this article into smaller, more specific articles. See Wikipedia:Article size. When I save the wikicode as plain-ascii, it shows up as 127KB. When I save it as Unicode, it's twice that. By the way, with the new edit-top-section setting you can set in user preferences, big articles are no longer as harmful to logged-in users as they once were. davidwr/(talk)/(contribs)/(e-mail) 02:07, 21 March 2008 (UTC)

I wasn't trying to edit the Barack Obama article, I was trying to read it, and it hung up my browser and I had to reboot the computer. I don't have any trouble editing big articles because I just click on any section and change the URL to section=0, which I think is the same thing that edit-top does. That doesn't always work, as sometimes you want to move something from the first section to a later section and want to do it in one edit, or just want to make one edit to the entire article, for example if you are doing a spell-check. So far the Barack Obama article is the biggest article I have ever tried to read, although of the "top 10 biggest feature articles"[6] only Bob Dylan (136k) gave me the same trouble, although it is true that none of the rest are as big as the Barack Obama article. By the way neither the Barack Obama article (127k) nor the similarly sized Byantine Empire (127k) now hang up the browser (but they still take an awful long time to load). But obviously a lot stronger discouragement has to be indicated in WP:SIZE to large (over 60kB articles) as the response I got at Talk:Barack Obama was half a dozen editors chiming in that the article was within limits allowed by WP:SIZE. Less emphasis should be placed on "readable text" as well, as that is not the number that is most readily available. To get "readable text" for Barack Obama, for example one is enjoined to simply cut and paste the text into a text window and then remove the links, see also, reference and footnote sections, and lists/tables. Ok, for me, for the Barack Obama article this means cutting and pasting into wordpad (notepad won't hold it), manually deleting all 226 references (now down to 187) ([1], [2], [3] etc.) plus who knows how many duplicate references ([8], [8] etc.) , then saving the article as a text file and looking to see how big the file is. I didn't do that because it would have probably taken at least an hour and just wasn't of any interest. The byte count (127k) was close enough for me to tell me that it was way way way too big. WP:SIZE should mention "readable text" as one reason for having a size criteria but the entire rest of the article should base it's criteria and counts only on the byte count that one sees when they click the "edit this page" tab, as it is the only size that is readily accessible, and is also visible in the history, except that the count is in bytes there instead of kB (off by 1.024:1). A section should also be added about image size, and number of images, because I really think that a page with a lot of photos is just as problematic as a page with a lot of text. Setting a limit that barely doesn't crash a browser is absurd. Set a limit to half that. Over half of the feature articles are less than 32 kB. Why allow any article to be over twice that big? 199.125.109.75 (talk) 03:18, 21 March 2008 (UTC)

What web browser do you use and how much spare RAM do you have? If you are already starting to use your memory swap-file or close to it before you open the page, it could crater some browsers. davidwr/(talk)/(contribs)/(e-mail) 07:59, 21 March 2008 (UTC)
It's not a very modern computer. It is running 98SE, has 480 MB Ram, and performance currently says "System resources: 15% free". I'm using the latest version of Firefox (2.0.0.12). I currently have three programs running, two Firefox windows, with a total of 17 tabs open. Closing everything has little impact on opening large articles. I'm on a slow dial-up that seems to sometimes slow to a few bytes per minute. 199.125.109.28 (talk) 01:05, 22 March 2008 (UTC)

[edit] Proposal

Mention "readable prose" only in the Readability section. Change A rule of thumb
From:
Some useful rules of thumb for splitting articles, and combining small pages:

Readable prose size What to do
> 100 KB Almost certainly should be divided up
> 60 KB Probably should be divided (although the scope of a topic can sometimes justify the added reading time)
> 40 KB May eventually need to be divided (likelihood goes up with size)
< 30 KB Length alone does not justify division
< 1 KB If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, why not fix it by adding more info? See Wikipedia:Stub. If it's an important article that's just too short, put it under Article Creation and Improvement Drive, a project to improve stubs or nonexistent articles.

To:
Some useful rules of thumb for splitting articles, and combining small pages:

Edit byte count What to do
> 100 KB Almost certainly should be divided up
> 60 KB Probably should be divided (although the scope of a topic can sometimes justify the added reading time)
> 40 KB May eventually need to be divided (likelihood goes up with size)
< 30 KB Length alone does not justify division
< 1 KB If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, why not fix it by adding more info? See Wikipedia:Stub. If it's an important article that's just too short, put it under Article Creation and Improvement Drive, a project to improve stubs or nonexistent articles.

Note that the only change is to refer to the more important "edit byte count" instead of the less important and very difficult to obtain "readable prose size". 199.125.109.100 (talk) 21:56, 27 March 2008 (UTC)

[edit] Survey

  • I'm going to have to oppose this proposed change for now. Using total byte size as the indicator for cut off is counter-productive to having articles that are well-referenced, particularly those that rely upon the bloat inducing {{cite}} templates, like the Barack Obama, Hillary Rodham Clinton, and John McCain articles do. While the goal is a lofty one (improving access to larger articles for those with older PCs), the result of this proposal would be a decrease in quality on articles that are well referenced and force hundreds of hours of work upon editors of articles where the only issue is that they chose to use the cite templates instead of manually formatting their citations. Now, if someone had a script that could do the conversion, then that's a different story, but as of right now, I haven't heard of one. The templates are there for a reason and the hundreds of thousands of times that they are used in articles is a pretty good indicator of their popularity. --Bobblehead (rants) 22:16, 27 March 2008 (UTC)
Don't use bloat inducing templates in bloated articles. Why make an already difficult problem impossible? Use simpler <ref>[http://www Subject]</ref> style instead. Half of the United States is still on dial up. What does that tell you about the rest of the world? Why write an encyclopedia in a manner that makes it impossible to use? And as to hundreds of hours? There are only a few articles that are horrendous and they got that way because of a stupid decision to totally eliminate the 32 kb limit instead of to expand it, plus an equally stupid decision to focus on readable prose instead of byte count. Using summary style it is trivial to fix each of the offending articles without losing any content. Make the change suggested and let the minions get to work and you will create a usable encyclopedia instead of the horrendous nightmare that currently exists. With the attention span that people have on the internet, you know that no one is reading more than a few paragraphs anyway. 199.125.109.100 (talk) 02:34, 28 March 2008 (UTC)
Hey. If you can convince someone to change WP:CITE#FULL, we could go with a simple link to the source rather than adding a full cite, but until then, we use full citations. The bloaty template just makes it easier to get the formatting correct. As far as there only being a few articles over 100k in total size.. There's actually almost 1,000 articles over 100k. Now I can't say that they are all because of sources, but "only a few" is a bit of an under-estimate. And using summary style to reduce the size of articles.. Umm.. What do you think is used in the John McCain, Barack Obama, and Hillary Clinton articles? Yup.. summary style. It's not the text that's the problem, it's the images and the references that's the problem. You aren't going to fix the problem with long load times by trying to hack the article text down. If you want to use full html page size as an example of why article size should be modified, you're talking about trimming the size of less than 10% of the problem in the case of the Obama article. --Bobblehead (rants) 02:47, 28 March 2008 (UTC)
There are 2,305,594 English articles, and only 932, or 0.04% are over 100,000 bytes. That is what I would call infinitesimal, and most of those are lists. "You" may use {{cite}}, but I certainly don't, and as you can see there is a strong reason to not use it in large articles that have a lot of references. Last time I checked CITE didn't say that you had to use a template. If even says that the ISBN number for a book is optional, but it really should emphasize that it should be included, instead of saying that it is optional. I could easily cut 75% out of the three articles you mentioned and not lose any content, and still use summary style, just by using one paragraph for each summary (instead of the up to 11 that are now included - and 15 and 20 for McCain - you honestly call 20 paragraphs a summary?), and adding subarticles where there currently are long sections that do not have subarticles. 199.125.109.100 (talk) 03:55, 28 March 2008 (UTC)
Can you point me to some articles that you've "easily" written, that use this style? Wasted Time R (talk) 04:03, 28 March 2008 (UTC)

Actually it was trivial. But bear in mind that you have to limit yourself to only one paragraph, not ramble on for 20 paragraphs, in a summary. Just simple cut and paste, making sure that you keep all the references. 199.125.109.100 (talk) 04:23, 28 March 2008 (UTC)

  • Oppose too. The less-than-32Kb featured articles are probably old ones, back from when citing requirements were much less stringent. For modern WP:BLPs of high-profile figures, especially political figures as are being mentioned here, sentence-by-sentence citing (sometimes even denser than that) is an absolute must. Any time you try to skimp on the citing, editors will challenge or try to remove material. That's just the way reality is right now. Wasted Time R (talk) 23:17, 27 March 2008 (UTC)
I don't mind if you cite every word. Just split it up into readable sections of less that 40-60 thousand bytes. Don't make me wait for three minutes while a page is loading so that I can find out for example how old someone is, or where they went to college. 199.125.109.100 (talk) 04:03, 28 March 2008 (UTC)

199.125.109.100's suggestion at Talk:John McCain that the entire early life and military career set of material be reduced to one paragraph, and replaced by a discussion of whether he is constitutionally eligible to be president, makes me wonder about this IP's sincerity. And this IP's frequent readings and editings of Talk:Hillary Rodham Clinton and Talk:John McCain, which are 130K and 100K in size, makes me doubt the dialup woes as well. Wasted Time R (talk) 12:57, 28 March 2008 (UTC)

I never suggested it be replaced. That is a separate discussion. Talk pages are text only and they load a whole lot easier than their associated article pages. This thread started when I went to Barack Obama to try to read it and it hung up my computer and I had to reboot. I did some research to try to find out why. At first I thought there must have been some hidden javascript or something, but I found out that the only reason it wasn't loading and was hanging up the computer was because at the time the page was about 140 kb and with images was about 480 kb. When I complained I got some lame excuses that wp:size says it's ok to make pages that are unreadable. Which is why I am here. To fix an easily fixed problem with Wikipedia that makes it unusable in its present form. Change three words and the problem will be fixed. How hard is that? Change "readable prose size" to "edit byte count" and the problem will be solved. 199.125.109.100 (talk) 14:59, 28 March 2008 (UTC)
Because the three word change would result in certain articles being of very poor quality simply because they cover a contentious topic and have to have pretty close to one reference per sentence. The Barack Obama articles is currently getting over 500 visits per minute and you're literally the first person to complain about not being able to bring up the page. Now, there are complaints that it loads slowly, but I haven't noticed an appreciable difference in loading the Obama page than any other article of similar size. I would think that there were would have been more complaints if it was a systemic issue. --Bobblehead (rants) 15:48, 28 March 2008 (UTC)
First there would be no change in content, so how anyone can say that quality would be affected is beyond me. Most people who view Wikipedia don't even know there are talk pages, and there have been many, many who have complained about long articles. How fast a page loads is also affected by how many images it has, so a text only page like a list of page tends to load a lot faster than an ordinary article. Most of the long pages are list of pages. 500 per minute is not a average over a 24 hour period, but I'm sure that the number is that high for some minutes out of the day. The average for February works out to 62.865 per minute. Most of those people have no interest in reading the entire article, but are only looking for one small piece of information, which they would find a lot easier if the article was segmented properly. Split it and find out what happens. Be bold. 199.125.109.122 (talk) 02:37, 29 March 2008 (UTC)
  • I also oppose. Obtaining the amount of readable prose is a simple process that can be done in less than a minute. My concern is making it too easy to justify slicing up appropriately long articles (because of the nature of the material) by editors with little grasp or interest in the underlying subject. Tom (North Shoreman) (talk) 01:29, 2 April 2008 (UTC)
Long articles are also popular topics and are quickly fixed by someone who does have a grasp and interest in the subject. Just for the fun of it randomly chop any long article inappropriately and watch to see how fast it gets fixed. You won't have to wait very long. Don't throw anything away. Keep everything but put it into subarticles. Your concern is unfounded. Only long lists that need to be sortable and therefore have to be in one article are "appropriately long articles". All others can appropriately be split up. 199.125.109.102 (talk) 01:29, 6 April 2008 (UTC)
Take United States for example. It is the most viewed non-topical article on Wikipedia other than Sex. Currently it is 167,707 bytes and has a tag at the top saying that since it is over 100 kB some browsers may have difficulty rendering the article. Yet if you look at the article, all but the first section, Etymology, not only has subarticles but all of the subsections have subarticles, for a total of 62 subarticles. That's a lot of content. Assuming that each has 30 kB that adds up to 1.8 Mb of content about one subject. Why painfully shove 167 kB of that into the main article when you can trivially move all but a few hundred words from the main article into the subarticles and leave the main article readable? You haven't changed anything but make the information accessible. Now isn't that the point of writing an encyclopedia, to make it available? What other example of bloatware can I think of? Oh yes, I remember. Don't make wikipedia become bloatware when you can make it become usable. 199.125.109.102 (talk) 03:43, 6 April 2008 (UTC)
You asked me on my talk page, “Do you realize what you have written?” My answer is yes. You then go on to write, “... I can just look in the edit window and see that it says "This page is 38 kilobytes long." for example which tells me everything I need to know about the page length - that it is too long.” It is this type of simplistic thinking that I am concerned about -- ignoring the content of the article and start slicing.
As far as your example of the United States article, an appropriate use of section titles and sub-titles allows a reader to browse through the article and fasten on the information they are looking for. Do we really need to so dumb down our processes that we refuse to acknowledge that the ability to browse through written materials is both a desired and readily attainable intellectual skill? BTW, let’s keep the discussion on this page. Tom (North Shoreman) (talk) 12:20, 6 April 2008 (UTC)
What you do need to do is limit the amount in one article to 30 kB so that the material can be found quickly. It is impossible to browse through the article to find the information one is looking for when the main article takes 2 minutes to load because it is 167,000 bytes long. Ditch everything about history for example and move it all to the subarticles other than one or two sentences. You have not lost anything other than gained the ability to find the information you are looking for. You may be on broadband and it may load quickly for you, but half of the US is on dial up and what about the folks in Australia or New Zealand that want to find out something about the US? Why make it almost impossible for them to find? You obviously just don't get it and I'm failing to find the right size 2x4 to whack you aside the head with so that you do. The only time an article needs to be over 30 kB is if it has no subarticles to slough material off into. Then you can let it grow to 40 kB edit byte length before you start thinking about splitting it, just like the guideline says (remember to think edit byte count though), and can wait until it grows to a total of 60 kB edit byte size before it probably should be split. However once the sub-articles have been created there is never ever any excuse to let the main article be over 30 kB, or 40 kB, or 60 kB if you really insist, although that would be really really stupid. As I said before, your concern is totally unfounded. 199.125.109.102 (talk) 23:21, 6 April 2008 (UTC)
Did you know that there is a template that lists 169 "major articles" about the United States? I didn't, because I didn't have time to wait two minutes for the article to load, plus it isn't included on that page, but if it was I probably would have found the factoid I was looking for quicker by looking through that list than I would have from looking through the article. Grrrr. 199.125.109.102 (talk) 00:34, 7 April 2008 (UTC)
Interesting - it is on the US page, but it is hidden half way up the page in the See also section so I didn't see it. 199.125.109.102 (talk) 00:40, 7 April 2008 (UTC)

[edit] Discussion

Please note that Barack Obama while meeting size guidelines if you only count readable prose size balloons to 484kb[7] and is very difficult to read for anyone on dial up. This would be remedied by simply referring to the more important and more easily obtained edit byte count in classifying whether the article should be split. 199.125.109.100 (talk) 21:56, 27 March 2008 (UTC)

It looks to me like this entire discussion has more to do with someone wanting to cut down Obama than with actual practice on Wikipedia articles. I Oppose the proposed change to this guideline, which does exactly what it's intended to do; hold article size to something within the average reader's attention span, which is 6 to 10,000 words, or 30 to 50KB of readable prose (while referring still to the older, technical limitations). Please take the Obama discussion over there; it's not even remotely close to being one of Wiki's longest featured articles, and has plenty of room to grow. SandyGeorgia (Talk) 16:04, 28 March 2008 (UTC)
Nope. I'm not suggesting removing anything. I'm only suggesting making it readable by moving it into separate manageable sections. And it has nothing to do with Barack Obama. Barack Obama just happens to be the first horrendously long article I tried to read. I could talk only about Manhattan, or United States, or Bob Dylan, or Billie Jean King, or Paul Wolfowitz, or General relativity, European Union, or Byzantine Empire, none of which I have ever tried to read, but all of which are in the top 250 biggest pages on Wikipedia.[8] And you must read faster than me, unless you meant that the average attention span was from 6 words to 10,000 words. Since the average attention span on the internet is from about 20 seconds to 3 minutes, you would have to read 3,000 words per minute to read 10,000 words. If you really want to know the average attention span you can look at the server logs and see how long each person stays at one page before they move on to another page. It isn't very long. There are four reasons for splitting an article, as outlined on this project page, reader issues, editor issues, contributor issues, and technical issues, and within them a total of 11 items, and attention span is only one of them. Someone has lost sight of the forest for the trees, and it isn't me. 199.125.109.122 (talk) 02:06, 29 March 2008 (UTC)
Wikipedia is written for people who still read. It doesn't have jazzy visual layout and its fair use doctrine means that it often doesn't have a lot of pictures. It doesn't look like a modern magazine with changing fonts and lots of colors. It's not meant to capture flighty-minded people hoppin' round the net. Consider Wikipedia a throwback, a digital version of the shelf-long encyclopedia sets people used to buy for their homes. It's for real readers, not the people you describe. Wasted Time R (talk) 02:24, 29 March 2008 (UTC)
Actually one source emphatically states that "the average attention span of most Internet users is 8 seconds". And yes, most people do not read at all. The average person does not own even one book, unless it is a bible. I think it's great that Wikipedia is providing a repository of information, and while there actually are people who read dictionaries cover to cover, and encyclopedias the same way, most people use them as a reference source, to look up some factoid they need for something. Instead of creating long articles that take three minutes to load you can allow them to find that factoid four times quicker by splitting up that article into pieces and letting them click on the one that has the information they are looking for. Clearly some editors seem to think that the goal is to create humongous articles, but it isn't practical to do that. You make the material much more accessible to take an article that has ABCDE sections and split it into A B C D E subarticles. 199.125.109.122 (talk) 03:22, 29 March 2008 (UTC)

Here is a history of changes:

  1. 7 March 2003 Copy from FAQ 10/10/20/30 [9]
  2. 7 March 2003 Add comment that Size means byte count, not readable text [10]
  3. 1 June 2003 Change 30 must be divided to 32 should be divided [11]
  4. 1 June 2003 Add size of lists [12]
  5. 6 April 2005 Change 10/20 to 20/20 [13]
  6. 17 January 2006 Change 32 to 50 [14]
  7. 21 February 2006 Change 20/20/30/50 to 20/20/30/60 [15]
  8. 24 February 2006 Change 20/20/30/60 to 20/20/30/50 [16]
  9. 6 April 2006 Change 20/20 to 20 [17]
  10. 5 October 2006 Add "of prose" [18]
  11. 5 October 2006 Change to "Prose size" [19]
  12. 22 February 2007 Add 100k [20]
  13. 22 February 2007 Change 100k from "Should be divided immediately" to "Almost certainly should be divided up" [21]
  14. 4 March 2007 Increase 20/30/50 to 30/40/70 [22]
  15. 4 March 2007 Reduce 70 to 60 [23]
  16. 7 January 2008 Change Prose size to Article size [24]
  17. 3 March 2008 Change Article size to Readable prose size [25]

I would have to say that the sizes have always meant byte count and not readable prose, and that it was an error to pretend otherwise. Otherwise at the time the switch was made from byte count to prose the numbers should have been reduced by three to four times, which was not done. It isn't prose count that is the most important, it is byte count that is the most important, anyway, and since prose count is not readily accessible, it should not be used. In the beginning the only count mentioned was edit byte count, and that is still the count that most editors think of when they refer to the size of an article, because that is the count that shows up in the browser for anything over 32 kb. 199.125.109.122 (talk) 06:56, 29 March 2008 (UTC)

  • It seems obvious from the above that the change to "readable prose" was made without prior consensus and should be removed. Ottava Rima (talk) 03:36, 23 April 2008 (UTC)
I couldn't agree more. 199.125.109.100 (talk) 21:46, 28 April 2008 (UTC)
Y Done Oakwillow (talk) 17:14, 14 May 2008 (UTC)

Support readable prose as the measure; it is no longer so much a matter of technical limitations as reader attention span. Readable prose has stood here as the measure for several years; it should continue. SandyGeorgia (Talk) 17:41, 14 May 2008 (UTC)

The above facts suggest otherwise. Oakwillow (talk) 17:47, 14 May 2008 (UTC)
However, it is much better to discuss things instead of just changing them, as was done in March. I agree that the first edit adding "overall" was ok, but changing "article size" was not. Oakwillow (talk) 17:54, 14 May 2008 (UTC)
Oakwillow, you seem to be focusing on the March 2008 change.. What about the October 2006 change to "prose size" that remained the active guideline for well over a year until the January 2008 edit back to "Article size". Even if you acknowledge the 2 month period it existed as "Article size" on this article, the operating understanding of this guideline has been that it is readable prose that determines if/when an article is split up, not the article size. It's also pretty obvious based on the result of the above proposal that readable prose is the determining factor...--Bobblehead (rants) 18:05, 14 May 2008 (UTC)
You can read all about it in the above discussion, but to summarize, there are four main issues affecting size and twelve factors, some of which relate to readable prose and some of which relate to edit byte count plus other factors. Oakwillow (talk) 18:17, 14 May 2008 (UTC)
  • Also oppose change to edit byte count. Readable prose and size of the final HTML page are measures directly affecting readers. Edit byte count only affects editors, and only if they do not use the feature to edit sections. Also, edit byte count grows as the article gets more well referenced, which would mean that improving the verifiability of an article would required unrelated changes to the content to keep it within those limits. HermanHiddema (talk) 09:41, 15 May 2008 (UTC)
Further, I would like to note that I find the actions of Oakwillow somewhat inappropriate. We had an ongoing discussion at Talk:Go (board game) about the size of that article, and when pointed to the guidelines at WP:Article size, Oakwillow made the change from "prose size" to "article size" above, despite the fact that the majority of responses had opposed that change, and the fact that prose size had been the guideline for a long time. HermanHiddema (talk) 09:41, 15 May 2008 (UTC)
You are misreading the proposal. The proposal was to change from Readable prose to Edit byte count, which is still under discussion. It was later determined that Article size was inappropriately changed to Readable prose, and that revert was requested, unapposed, by Ottava Rima. I simply implemented it. Oakwillow (talk) 13:19, 15 May 2008 (UTC)
Sorry, but you're wrong. It was never determined that Article size was inappropriately changed to Readable prose. Only suggested. The text was changed from "Article size" to "Readable prose size" on March 3. But before that, it was changed from "Prose size" to "Article size" on January 7, without discussion. It is the first change that is inappropriate, not the second one, which only reinstated the correct formulation of a policy that had been in effect for years. These limits have been on prose size since basically forever. Eg, see this version of exactly 3 years ago, where the lead section identifies two issues. a) technical issues, and b) considerations of readability and organization. These technical issues were with older browsers that could not edit text over 32kb in length. No current browsers have this limitation, and the ability to edit sections invalidates most "dialup speed" issues with editing long text. So what matters is b) readability and organisation. There was discussion on that over 3 years ago over here. Clearly this policy is and always has been about readable prose. HermanHiddema (talk) 14:03, 15 May 2008 (UTC)
Actually, I believe I read every revision of the guideline. That particular version, in my opinion was clearly referring to edit byte count except for in the one occasion that it called out a number for readable prose, saying that "readers may tire of reading a page in excess of 20-30 KB of readable prose". They may also tire of reading a page that is a lot shorter than that too. Since most people use encyclopedias for reference (duh), making information more accessible is more important than writing 500 page articles that only a few have the attention span to read. Oakwillow (talk) 15:06, 16 May 2008 (UTC)

Please address the issue of why you claim that the March 3 edit was inappropriate, but the January 7 edit was not. HermanHiddema (talk) 17:59, 16 May 2008 (UTC)

Article size has always been edit byte count. It is a perversion to think otherwise. However, it allows anyone who wishes to think so, and is the preferred wording for that reason. Oakwillow (talk) 18:23, 16 May 2008 (UTC)
So the March 3 edit is inapproprioate because it conflicts with your opinion on the issue, while the one on January 7 is not inappropriate because it agrees with your opinion on the issue? HermanHiddema (talk) 18:36, 16 May 2008 (UTC)
It conflicts with the content. My opinion is neither here nor there. This is an encyclopedia, not a blog. Opinions are not very important. Oakwillow (talk) 18:54, 16 May 2008 (UTC)
On the contrary, it is supported by the content. HermanHiddema (talk) 22:09, 16 May 2008 (UTC)

[edit] New talk

[26] --Mojska 666Leave your message here 15:52, 5 May 2008 (UTC)

[edit] Readable prose numbers?

A user has requested comment on Wikipedia policy or guidelines for this section.
This tag will automatically place the page on the {{RFCpolicy list}}.

When discussion has ended, remove this tag and it will be removed from the list.

It seems section "A rule of thumb" has been edited to change "Article size" to "readable prose" in this edit, as has been pointed out in the Talk above. Has this been done with the consensus of the editors? I humbly suggest that this preceeding edit was Ok (There is no need for haste, and the readable prose size should be considered separately from references and other overhead.), but the next edit was not. The rule of thumb (I thought) was to be based on the actual number seen when editing. Editors would then look further at the number of bytes in the readable prose (they would need to determine this themselves in an editor) and possibly also at the use of templates to help decide how and what should be reduced or split. So, please could someone re-address this issue? It seems the Talk above brought it up, but left it unresolved. My opinion is that the byte count should appear in the Rule of thumb and that still better wording be used in the No need for haste. -84.223.78.86 (talk) 17:21, 14 May 2008 (UTC)

Actually the discussion above focused only on the table and ignored the very first section WP:Article size#Readability issues which first raises the issue of "readable prose". This section has been there well before the change in the table which appears to me have been a logical step to make the article consistent. If it was intended all along, as some claim, to only use the concept of total bytes then the concept of "readable prose" never would have been included at all.
There was, and still is, a proposal to change the table. This proposal has not secured a majority of support, let alone a consensus. If you have an alternative proposal, then spell it out and we can discuss it. Tom (North Shoreman) (talk) 17:40, 14 May 2008 (UTC)
Readable prose is only one factor to consider. Getting back to the "No need for haste section", do you have a suggestion?

"Do not take precipitous action the very instant an article exceeds 32 KB overall. There is no need for haste, and the readable prose size should be considered separately from references and other overhead. Discuss the overall topic structure with other editors. Determine whether the topic should be treated as several shorter articles and, if so, how best to organize them. Sometimes an article simply needs to be big to give the subject adequate coverage. Certainly, size is no reason to remove valid and useful information."

Add, "without moving it to a subarticle" at the end? Oakwillow (talk) 17:45, 14 May 2008 (UTC)
  • I think it was pretty well resolved in the above proposal where the IP proposed that the article size be used to determine when an article should be split up and all of the responders to that survey responded as being opposed to that change. As has been noted several times, using the article size (per the message that appears when you edit the article) is a bit of a misnomer in that it includes references and other non-readable text. Heck, it doesn't even accurately reflect the actual total size of an article in that the size of the images on the article are not included and those are the things that eat up the most bandwidth when it comes to downloading the page. It may be unfortunate, but if Wikipedia wants to have well referenced articles and maintain the inline references, then the size that needs to be used is the readable prose, not the article size. --Bobblehead (rants) 17:53, 14 May 2008 (UTC)
  • Concur. SandyGeorgia (Talk) 17:56, 14 May 2008 (UTC)
  • That's not necessary. When it comes to some things edit byte count is most important, when it comes to other things, readable prose is most important. I can assure you that on dial-up, edit byte count is hugely dominant. There is no one size fits all way to choose which to use. However, as pointed out above, if you do wish to change "article size" to "readable prose size" you need to divide all the numbers by two or three to make them work. Forcing people to "have to" look up the readable prose metric is unnecessary. Editors should feel free to use whatever grain of salt they deem appropriate. Oakwillow (talk) 18:08, 14 May 2008 (UTC)
  • Edit byte size may be the easiest to come up with, but it is also the worst of all the options we have to determine when it may or may not be appropriate to split up an article. This is because of the possible measures that could be used it is the one that doesn't actually reflect anything important. Readable prose reflects the attention span of the average reader and total html size reflects the size that is actually downloaded when viewing the page. The only thing edit byte size reflects is the amount of text in the edit field, which only impacts someone if they want to edit an article. Even then, the impact is minimal because of the advent of sectional editing. --Bobblehead (rants) 18:31, 14 May 2008 (UTC)

[edit] Table of bytes downloaded

From the responses above I do not have a suggestion as the issue appears more complex than a simple rule of thumb can address. In fact Oakwillow makes the important point that both can be relevant. Meanwhile to help (me, mostly) understand what causes the bloated size of some large Wikipedia articles (which causes slow loads for those with slower computers and dial-up network access), here is a table showing a breakdown of bytes used in two of the longest articles (Barack is Barack Obama and Hilary is Hillary Rodham Clinton presidential campaign, 2008). ( added Barack campaign = Barack Obama presidential campaign, 2008 ) -84user (talk) 20:28, 14 May 2008 (UTC)

Web page part Barack Hillary campaign Barack campaign Russia
preamble (css, javascript, etc): 14,418 8,605 10,240 19,391
html for readable prose up to Notes: 87,474 279,455 191,194 180,398
Notes and references(for WP:V) 267,997 395,645 unknown 101,714
Cited works and external links: 13,654 4,360 unknown about 6000
wikitables: 82,895 29,089 unknown 100,099
postamble (cats, wikipedia stuff, etc): 15,252 9,117 unknown 25,122





total: 481,690 726,271 498,005 447,775
readable prose chars: 36,741 122,926 84,340 71,496





Edit byte count 127,646 239,412 170,507 125,790
Download size 475k 710k 488k 440k
Plus images size: 78k 420k 520k 775k
Printed pages* 8 27
seconds for 56kbps user to download**: 99 166 180 240

* Not including references.
** using http://www.websiteoptimization.com/services/analyze/index.html

The large amount in Hillary's article surprised me, but even without the Table of Contents there are 122,926 characters of readable text there (each newline counts as one). I also tried some load tests on different browsers, one under an emulator to exagerrate the slowness (I am on a fast PC with fast network), and also used the www.octagate.com Site Timer (which reports 5.4 seconds for Barack and 11.5 sconds for Hillary's page - these times match my fastest times on a 2.6 GHz PC with over a gigabyte of RAM, so I can well believe that dial-up users find these pages "unloadable"). -84.223.78.86 (talk) 18:37, 14 May 2008 (UTC) (I added Russia.-84user (talk) 18:25, 18 May 2008 (UTC))

You left out an important component (images), which the last time I looked, is what slows down Clinton, and has nothing to do with any of the measures we're looking at. SandyGeorgia (Talk) 18:43, 14 May 2008 (UTC)
Added. Please correct if they are wrong, or change from k to bytes. Another metric I would like to suggest is no article with a printed page count of more than 10 pages, not including the references. The references in Barack are another 10 pages, so 18 total and 12 for Hillary, 39 total. Oakwillow (talk) 19:18, 14 May 2008 (UTC)

Actually SandyGiorgia, you are right, the images are significant and rather large for some articles (both democrat campaigns have over 420 kilobytes with three large ones). The numbers in my table exclude image sizes, they are the raw byte count of the html page. I am adding approximations now. (I just created this username). -84user (talk) 19:44, 14 May 2008 (UTC)

Thanks for telling me; considering the work I've done on looking at the load time of this article relative to other articles, its prose size, and its images, I never would have guessed I might be right ... I actually look for opportunities to spout off and be wrong :-))) No, we didn't need to add printed page size; the relevant measure (word count and prose size) deal with reader attention span. Layout is another matter, affected by tables and such. SandyGeorgia (Talk) 19:57, 14 May 2008 (UTC)
Sorry, I did not mean it in any sarcastic way, I really did realise after reading your comment and checking images, yes I'd forgotten about the images, and two, that they have a big effect. -84user (talk) 20:28, 14 May 2008 (UTC)
I was just joshing :-) But I've spent a lot of time looking at these issues, as I'm often forced to a dialup when I travel, and I believe the issues in many of the slow-loading articles will resolve to images, not prose, although I haven't spent enough time sorting out what factors affect load time wrt images. I think our measure of readable prose is fine. SandyGeorgia (Talk) 20:47, 14 May 2008 (UTC)
However if you do think of switching to readable prose, don't forget to divide all the numbers by two or three, or 2.5 or something. See the dramatic comparison above. Images don't bother me in most articles, because most articles use thumbs which resolve to about 26 kB each, not a factor. Once in a great while someone insists on putting in three or four 400 pixel images or a hundred flag symbols, both of which take like forever to load, but other than that the images are not a big factor. It is true that there is a fairly close relationship between readable prose and printed pages, but since I sometimes do print out articles to show people I always cringe when they run into 10 or more pages, knowing that it really isn't ever going to be read. Oakwillow (talk) 21:47, 14 May 2008 (UTC)
I think the more interesting question is why editors insist on shooting themselves in the foot by writing articles so long that no one will read them anyway, or adding so many images and slowing down the loadtime so much that no one will even click on the article. SandyGeorgia (Talk) 21:56, 14 May 2008 (UTC)

Evidently nobody wanted to create any subarticles for the campaign. Oh well, to each their own. I would like it if someone could make a table of 50 articles of various sizes so that we could plot them and compare to see if there is a strong correlation between edit byte count and readable prose size. So far the three examples above range from a ratio of 1.95 to 3.47. So if it was determined that the ratio always was in that range, would you prefer dividing all the numbers by some mid value, or would you prefer just changing the guideline to say edit byte count? Or should I wait for the data before asking the question? Oakwillow (talk) 01:18, 15 May 2008 (UTC)

[edit] Table

The following articles were chosen randomly to give a cross section of FA, GA, and unrated articles. The FA articles were chosen from a broad cross section of categories, the GA articles were chosen randomly through the alphabet, and the unrated ones were just what came up using random article. The political campaigns and their countries were added separately.

Article Edit byte count Readable prose Ratio (edit/readable) Word count Ratio (edit/word)
United States 165,484 67,584 2.45 10,506 15.75
Barack Obama 134,636 36,864 3.65 5,723 23.53
Barack campaign 181,190 79,872 2.26 12,907 14.04
Barack campaign 173,163 75,776 2.29 12,128 14.28
Hillary Clinton 157,625 60,416 2.61 9,487 16.61
Hillary campaign 252,818 119,808 2.11 19,633 12.87
John McCain 114,071 40,960 2.79 6,515 17.51
McCain campaign 118,343 56,320 2.10 9,075 13.04
Zimbabwe 84,138 46,080 1.83 7,200 11.69
Zimbabwe election 181,941 114,688 1.59 18,795 9.68
Pakistan 77,668 37,888 2.04 5,879 13.21
Pakistan election 27,421 11,785 2.33 1,969 13.93
Russia 125,639 61,486 2.04 9,539 13.17
Russia election 28,793 13,999 2.06 2,229 12.92
Sanssouci 37,086 26,624 1.39 4,289 8.65
Whitstable 46,437 23,552 1.97 3,906 1.89
History of saffron 35,301 19,456 1.81 2,967 11.90
Hurricane Irene (1999) 29,583 16,384 1.81 2,455 12.05
Nahuatl 97,404 38,912 2.50 6,102 15.96
Salvador Dalí 65,750 34,816 1.89 5,704 11.53
Hydrochloric acid 29,245 14,336 2.04 2,225 13.14
Frederick Russell Burnham 71,556 28,672 2.50 4,934 14.50
Krag-Jørgensen 42,699 20,480 2.08 3,368 12.68
Patriot act summary 49,598 27,648 1.79 4,412 11.24
Polar coordinate system 32,062 17,408 1.84 2,840 11.29
Pilot (House) 11,295 7,344 1.54 1,170 9.65
List of snakes** 10,628 1,035 10.27 197 53.94
A. Scott Berg 14,716 6,631 2.22 1,051 14.00
Joe Delaney 11,818 4,968 2.38 848 13.94
Jaws: The Revenge 28,681 17,408 1.65 2,949 9.73
Richmond, Virginia 90,617 57,919 1.56 9,166 9.89
Double Tenth Incident 12,318 8,552 1.44 1,413 8.72
Hayley Westenra 33,271 12,288 2.71 2,112 15.75
Liverpool F.C. 60,797 22,646 2.68 3,769 16.13
Oregon Supreme Court 33,692 19,852 1.70 3,291 10.24
Maedhros 15,994 10,204 1.57 1,712 9.34
Single Audit 33,364 18,750 1.78 2,896 11.52
Yamashita Yoshiaki 11,689 4,873 2.40 790 14.80
George Hoey 21,264 7,844 2.71 1,327 16.02
East 233rd Street (Bronx) 6,032 2,353 2.56 409 14.75
Germanium tetrachloride 5,548 3,133 1.77 483 11.49
Ouvéa Island* 1,560 1,194 1.30 214 7.29
Abeokuta 16,268 10,647 1.53 1,763 9.23
DJ Mix Sun Ra 843 272 3.11 47 17.94
The Dungeonmaster 4,613 2,421 1.91 392 11.77
Cane River 1,037 788 1.32 131 6.02
Fairfield Metro Center 7,668 3,669 2.09 576 13.31
Stor Island 1,879 400 4.70 70 26.84
Arcesilaus I of Cyrene 1,709 777 2.20 133 12.85
Nicholas Bartlett 2,288 438 5.22 75 30.51
Hamburg singles** 19,026 155 122.75 39 487.85

* Article does not cite any references or sources.
** List.

Ratio of readable prose to word count = 6.204 with a correlation of 0.9995

Ratio of edit byte count to readable prose text = 2.07 with a correlation of 0.965

Ratio of edit byte count to word count = 12.8, with FA articles tending to have a lower ratio than GA or unrated articles. For practical purposes an easy rule of thumb to guestimate word count is to just divide edit byte count by 12. This will not work for lists and articles with many illustrations or tables. A more accurate word count can be obtained using a text editor or installing {{subst:js|User:Dr pda/prosesize.js}} in your monobook.js Oakwillow (talk) 17:49, 8 June 2008 (UTC)

[edit] Poll

It is clear that there has been some confusion about the measurement of article size. Most people interpret "Article size" to be "edit byte count", although many experienced editors think of it as "Readable prose size". The current table, however, is a legacy from when it did mean edit byte count. Therefore, there are three options, please choose one or more:

[edit] A. Change "Article size" to "Readable prose size"

Change "Article size" to "Readable prose size" and adjust all quantities by a factor of approximately 2.5 correspondingly. This will have the effect of forcing editors to ignore the 32 kB warnings and artificially figure out how to count readable bytes to determine appropriate article length.

  1. Well, aside from this poll being one of the more biased polls I've seen in quite awhile, readable text is the size that is most important measurement as far as an encyclopedia goes. The average attention span of a reader has is what we should be measuring the articles against, not some random measurement, like edit byte size, that is not applicable to any issue that causes problems for readers or editors. Edit byte size is impacted by the number of references an article has and whether or not the sources are formated using cite templates. This becomes especially problematic for controversial topics where it is not uncommon for editors to fight over the most minor of information if they are not cited properly. In articles related to politicians, it is not uncommon for the amount of Kb references take up to be larger than the amount of Kb the actual text of the article takes up, due to the tendencies of the editors to fight over the smallest of nits. --Bobblehead (rants) 15:27, 16 May 2008 (UTC)
    The article already uses the concept of readable pause and has for years -- as at least four editors have already pointed out. Readable prose should stay as the operative term. There is already a proposal on the table covering this subject, and this appears to be nothing but an attempt to further confuse issues. Tom (North Shoreman) (talk) 15:50, 16 May 2008 (UTC)
    The proposal that was on the table is the same as option B, below. This poll supersedes that proposal, as it is more comprehensive. Oakwillow (talk) 16:34, 16 May 2008 (UTC)
    So YOU say. How do you, a single editor, have the authority to say that your proposal takes precedent and an existing proposal is no longer open for consideration or debate? People have registered their opinions above and are under no obligation to participate in your "biased poll" in order for their expressed opinions to remain valid in determining consensus or lack of consensus.Tom (North Shoreman) (talk) 16:43, 16 May 2008 (UTC)
    The original proposal was broken because it did not provide any valid choices for editors such as yourself who were opposed to option B. By the way, if you could help fill in the table with the readable prose numbers I can do the ratio calculations. It is extremely tedious for me to obtain readable prose size. I noticed above you indicated that it takes you less than a minute. Oakwillow (talk) 17:06, 16 May 2008 (UTC)
    If you had read the "How do you find the readable page size?" topic above, you'd have seen that the User:Dr pda/prosesize.js tool allows you to find readable prose sizes and readable prose word counts instantly. Wasted Time R (talk) 21:33, 16 May 2008 (UTC)
    I'm on dialup. Nothing is instant on dialup. WP is a collaborative project. If someone else could fill in the prose column, I can do the rest. My computer often locks up long before any of those long files load, but I don't have to load them to get the byte size, I just look at the history.[27] Oakwillow (talk) 00:35, 17 May 2008 (UTC)

[edit] B. Change "Article size" to "Edit byte count"

Change Article size to Edit byte count and make no changes to quantities. Readable prose is considered separately as a measure of article size, but since there is a strong correlation between the two measures, both are equivalent.

  1. This is the simplest option, this quantity is displayed for all to see every time an article or even a section that is over 32 kB is edited. Staying within guidelines guarantees readable prose is also within reasonable limits. Oakwillow (talk) 14:59, 16 May 2008 (UTC)
    It is very clear that "readable prose" and "article size" are not "equivalent. As the discusion above (as well as the article itself) demonstrate, these are two very different concepts. Basic inaccuracies such as this invalidate any results that may come from this poll. The existing proposal provides two very clear alternatives -- this "based poll" adds nothing to the ongoing debate. Tom (North Shoreman) (talk) 16:51, 16 May 2008 (UTC)
    Equivalent in the sense that there is a one to one, two to one, three to one, in other words a linear relationship between them. Knowing one you know the other. See the table above. Oakwillow (talk) 17:27, 16 May 2008 (UTC)

[edit] C. Make no changes. Leave it saying "Article size"

Make no changes. Leave it saying "Article size" instead of "Readable prose size". This will mean that to some editors this will mean "edit byte count", and to others, who will tend to encourage articles that are two to three times as long, it will mean "readable prose bytes". Since both metrics are important in determining article length, readable prose and byte count, everyone is happy.

[edit] Validity of this poll??

The poll makes a number of assumptions and generalizations that are either debateable, inaccurate, or unverifiable. While I responded, the man debate should remain focused on the original proposal made above -- as of this date there is a clear majority opposed to changing "readable pose" to "article size."Tom (North Shoreman) (talk) 15:59, 16 May 2008 (UTC)

I think you meant to say changing "Readable prose size" to "Edit byte count". Changing to article size is not one of the options. Oakwillow (talk) 16:28, 16 May 2008 (UTC)
NB I apologize for the bias that was originally in option C, and have removed it (by changing "continue to mean 'edit byte count', which it is" to "edit byte count"). Oakwillow (talk) 16:44, 16 May 2008 (UTC)
The main bias is your claim that the article currently says "article size" when it very clearly does not. You are mistakng your minority opinon on what the article should say with what it actually does say. Tom (North Shoreman) (talk) 16:58, 16 May 2008 (UTC)
Someone, not to mention any names, improperly recently changed it, but that is easily fixed. The article to all intents and purposes says "Article size", but I'm not going to get into an edit war about it. Oakwillow (talk) 17:30, 16 May 2008 (UTC)

Very biased poll. All the options make the implicit assumption that the current numbers reflect "edit byte count", when clearly they were meant to mean "readable prose size". I might as well make the following options for balance:

D. Change "Readable prose size" to "Edit byte count"

Change "Readable prose size" to "Edit byte count" and adjust all quantities by a factor of approximately 2.5 correspondingly.

E. Change "Readable prose size" to "Article size"

Change "Readable prose size" to "Article size" and make no changes to quantities. The description will be ambiguous and every user can choose for it to mean what they think it should mean.

F. Make no changes. Leave it saying "Readable prose size"

Make no changes. Leave it saying "Readable prose size" instead of "Article size". This will mean that to all editors this will mean the same thing and be absolutely clear.

HermanHiddema (talk) 22:42, 16 May 2008 (UTC)

Look at the history. The word prose does not even appear in the first three and a half years, other than to say that the numbers do not refer to prose. If they did mean prose they would have been adjusted downward by 2 or 3 when the switch was made from byte count to prose, and that clearly never happened. Oakwillow (talk) 03:44, 17 May 2008 (UTC)
This version from march 6 2004, 1 year after the page was created, already contains the phrase: "Readers may also tire of reading a page in excess of 20-30 KB of readable prose (tables, lists and markup excluded)." HermanHiddema (talk) 14:13, 17 May 2008 (UTC)
Which is compatible with a limit of 40-100 KB edit byte count. I'm still waiting for someone to add in the numbers for the readable prose column. Here is a compromise table. Oakwillow (talk) 14:57, 17 May 2008 (UTC)

[edit] Compromise table #1

Article size What to do
Edit byte count Readable prose size
> 100 KB > 40 KB Almost certainly should be divided up
> 60 KB > 25 KB Probably should be divided (although the scope of a topic can sometimes justify the added reading time)
> 40 KB > 15 KB May eventually need to be divided (likelihood goes up with size)
< 30 KB < 10 KB Length alone does not justify division
< 1 KB If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, why not fix it by adding more info? See Wikipedia:Stub.

You call that a compromise table? Please look up the word compromise in a dictionary. This table is one that completely reflects your view of the issue and makes absolutely no compromises. I might as well make this table and call it a compromise table:

Article size What to do
Edit byte count Readable prose size
> 250 KB > 100 KB Almost certainly should be divided up
> 150 KB > 60 KB Probably should be divided (although the scope of a topic can sometimes justify the added reading time)
> 100 KB > 40 KB May eventually need to be divided (likelihood goes up with size)
< 75 KB < 30 KB Length alone does not justify division
< 1 KB If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, why not fix it by adding more info? See Wikipedia:Stub.

See. It completely reflects my view of the issue, so it must be a great compromise...

Totally wrong. This totally ignores the fact that "Readers may also tire of reading a page in excess of 20-30 KB of readable prose". Oakwillow (talk) 15:24, 17 May 2008 (UTC)

Now an actual compromise table would be one that compromises between the ones above, eg:

Article size What to do
Edit byte count Readable prose size
> 150 KB > 60 KB Almost certainly should be divided up
> 100 KB > 40 KB Probably should be divided (although the scope of a topic can sometimes justify the added reading time)
> 70 KB > 30 KB May eventually need to be divided (likelihood goes up with size)
< 50 KB < 20 KB Length alone does not justify division
< 1 KB If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, why not fix it by adding more info? See Wikipedia:Stub.

See? That's what we call a compromise. HermanHiddema (talk) 15:16, 17 May 2008 (UTC)

You are compromising the numbers to compromise the point. Compromise in the sense that it includes both choices. It is by far the best thing to do, just include both columns, and make everyone happy. Oakwillow (talk) 15:24, 17 May 2008 (UTC)
If you feel that way, I am fine with using the second table, which most accurately reflects what the article has been saying for year. HermanHiddema (talk) 15:26, 17 May 2008 (UTC)
Ah, that is what you "think" it has been saying, but that is not the case, and that is why there have been so many complaints about long articles. Oakwillow (talk) 15:56, 17 May 2008 (UTC)
Yes, that is what I "think" it has been saying. You "think" it has been saying something else. Which means we both have an "opinion". To find a compromise between two opinions, you try to find some half-way point. Which is what I did. You, however, keep asserting that your own opinion is somehow "fact" while mine is "false". That has nothing to do with compromise. HermanHiddema (talk) 16:39, 19 May 2008 (UTC)

[edit] Prose size stats

I don't know what's going on here, but it seems to fit the WP:TLDR bill. Here are some stats on prose size on featured articles, measuring prose size exactly as this article recommends, and as has been done for several years; Barack Obama, Hillary Clinton and John McCain are all well within guidelines and aren't even close to being as long as many featured articles.

Oh, and polls are evil, and there is no consensus to change this guideline. Also, this may help:

  • Wikipedia:Miscellany for deletion/Wikipedia:WikiProject Extra-Long Article Committee. SandyGeorgia (Talk) 04:08, 17 May 2008 (UTC)
    • What is going on here is that I am fixing a serious problem with the guidelines, which crept in because some people shifted to thinking about readable prose, but failed to change the dividing lines, effectively multiplying the acceptable article size by from 2 to 3 meaning that there are continual complaints about articles being too long and continual pointing to oh no it's not too long, it's well within the guidelines, even though it is so long that it locks up your computer and is totally inaccessible. The fix is simple. There are three ways to fix it, one, choice A above, recognize that the numbers need to be adjusted if they are to mean readable prose, two, choice B above, simply use the numbers as edit byte count because that is the easiest metric to obtain, or three, leave the article as is but insure that it says "article size" for the table and not "readable prose size" and recognize that a percentage will treat it as byte count, which it really is, and a percentage as readable prose, artificially allowing articles to be 2 to 3 times as long. As this is a guideline, editors are free to do whatever they wish, and if they want a 100kB or 450kB article, that is their prerogative. And they will also know that it's a problem. Oakwillow (talk) 04:52, 17 May 2008 (UTC)
    • I notice that this is what you wrote in 2006 from that discussion: "There are 50KB articles that are too long (because they're all prose, no references), and there are well-cited 80KB articles that aren't too long (the KB is mostly in references)." Since references are not included in calculating readable prose, am I to conclude that you meant "edit byte count"? Oakwillow (talk) 16:24, 17 May 2008 (UTC)

These stats are very interesting. They show that most FA quality articles are in the range 10-30k of readable prose, with sizable minorities under 10k and in the range 30-50k. Articles of over 50k are rare, only about 3%. This is reasonably in line with the text of this article, which gives upper limits of 30-50k of readable prose. They would, in my opinion, form an excellent source on which to base the numbers in this article. HermanHiddema (talk) 16:48, 19 May 2008 (UTC)

I adjusted the numbers accordingly (see below). I notice that you recently installed the prose tool, can you fill in the numbers above? Also the ones I filled in, change them if you get very different results - you don't need to calculate the ratio, but erase the ratio if you change the prose size, ok? For the last article I only included the first sentence which is why I put in a question mark. I could use the November 2007 article list, but I think that more recent articles would be better to use. It is very interesting to me, that out of a million articles less than 10 were over 65 kB readable prose. Would you agree with saying ">50 kB Almost certainly should be divided up"? (see above proposal) Oakwillow (talk) 07:52, 20 May 2008 (UTC)
Note that those stats are only about featured articles as of November 2007, not wikipedia totals (See also: Special:Longpages). These statistics are therefore over 1721 articles, which means:
  • What consitutes a large article?
    • about 1% are > 60k
    • about 3% are > 50k
    • about 10% are > 40k
    • about 30% are > 30k
  • What consitutes a small article?
    • about 40% are <= 20k
    • about 20% are <= 15k
    • about 7% are <= 10k

You proposal was the following: (I undid that one to restore the context of the original discussion)

[edit] Compromise table #2

Article size What to do
Edit byte count Readable prose size
> 100 KB > 50 KB Almost certainly should be divided up
> 60 KB > 30 KB Probably should be divided (although the scope of a topic can sometimes justify the added reading time)
> 40 KB > 20 KB May eventually need to be divided (likelihood goes up with size)
< 32 KB < 15 KB Length alone does not justify division
< 1 KB If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, why not fix it by adding more info? See Wikipedia:Stub.
The first compromise table used 2.5 for the ratio between edit byte count and readable prose. This one uses 2.0, and anything from 2 to 3 could be used. Oakwillow (talk) 16:41, 20 May 2008 (UTC)

I do not think it is right to say that about 30% of FA class article "probably should be divided", I would reserve that for the 10% mark. I would then start the sliding scale of "May eventually need to be divided (likelihood goes up with size)" at that 30% point. Further, I have removed the "edit byte count" column in my proposal. As yet no clear reliable ratio between "edit byte count" and "readable prose" has been established, in the above results it varies from 1.27 to 5.15 in small article, and from 1.73 to 3.47 in larger ones. With the disappearance of technical limitation of browsers, and the availability of section editing, I think edit byte count is a far less important measure than readable prose. Readable prose is a factor in content quality, while edit byte count only had technical impacts. HermanHiddema (talk) 11:32, 20 May 2008 (UTC)

Wouldn't it be fair to assume that those 30% that are bigger are that way because the editors felt that "the scope of [the topic justifies] the added reading time"? Oakwillow (talk) 17:47, 20 May 2008 (UTC)

So, my proposal:

Readable prose size What to do
> 50 KB Almost certainly should be divided up
> 40 KB Probably should be divided (although the scope of a topic can sometimes justify the added reading time)
> 30 KB May eventually need to be divided (likelihood goes up with size)
< 20 KB Length alone does not justify division
< 1 KB If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, why not fix it by adding more info? See Wikipedia:Stub.

I feel these numbers are in line with the stats. HermanHiddema (talk) 11:32, 20 May 2008 (UTC)

I do not agree that your analysis of the statistics is conclusive or even terribly relevant. The statistics do not show, for example, what the difference is regarding size between FA articles and non-FA articles. I would guess that the average FA article is longer than the average non-FA article. This would counter the "shorter is better" theory with a "longer is better" argument -- neither one of which tells anything close to the whole story. The purposes of this article is clearly listed at the start and "creating featured articles" is not one of them. Also Wikipedia:Featured article criteria lists article length as only one of ten highlighted factors and specifically says this:
Length. It stays focused on the main topic without going into unnecessary detail (see summary style)
"Length" of course must be balanced with:
comprehensive: it neglects no major facts or details
I fail to see the proof that cutting in half the existing rule of thumb regarding the readable prose level at which the article "Almost certainly should be divided up" would add quality to virtually all articles. The quality of any article as well as the appropriateness of subdividing it is best handled by a discussion of the content of the particular article. I suggest we stick to the stated purposes of the size guidelines in our discussion over changing this article. I do not feel that Wikipedia's articles should be restricted in size based primarily on the poorest connection speeds available. If anything, we should be increasing the rule of thumb numbers to recognize the advances that have occurred since the numbers were originally calculated. Tom (North Shoreman) (talk) 12:53, 20 May 2008 (UTC)
It isn't cutting in half the existing rule of thumb because the numbers are for edit byte count not for readable prose in the existing article. The fact that they say readable prose is simply an error, as shown above in the edit summary, which can either be corrected in one of three ways, A, B, or C, above. It isn't going to kill anyone to see both numbers, edit byte count and readable prose, so the compromise solution is probably the best thing to do. However, instead of guessing about the difference between FA and other articles, how about filling in the stats, since you are the one that pointed out that you can "instantly" determine the readable prose size. I filled in the short ones, although some of them may need to be corrected. Oakwillow (talk) 16:04, 20 May 2008 (UTC)
Your claim that "readable prose [in the current article] is simply an error" is not true, and it has been adequately rebutted above. I see no purpose in filling in a table that I fell has little relevance to this discussion, but if you want to do so, go ahead. Tom (North Shoreman) (talk) 16:14, 20 May 2008 (UTC)

PS I went back to October 6, 2005 which had the following sentence:

"However, do note that readers may tire of reading a page in excess of 20-30 KB of readable prose (tables, lists and markup excluded)."

As you should be able to see, the concept of readable prose as the most relevant count has been with this article a long time. Tom (North Shoreman) (talk) 16:21, 20 May 2008 (UTC)

However, 20-30 kB means "no articles longer than 20 kB are going to be read by a lot of readers". I'm on dial-up. It would be nearly impossible to fill it in, and my numbers would not be the same because I don't do it the same way you do (I am not willing to use javascript). The allegation of it being an error has most certainly not been rebutted, it has been strengthened. Wouldn't although the scope of a topic can sometimes justify the added reading time apply to those 30% that are over 30 kB? Would you prefer to say "although 30% of the time the scope of a topic may justify the added reading time"? Bear in mind that it isn't a matter of us providing more reading time, because what it is doing is simply guaranteeing that the entire article is not going to be read most of the time - we can't change someones attention span just by creating a longer article. It just doesn't get read. Oakwillow (talk) 16:27, 20 May 2008 (UTC)
I think that your October 6, 2005 version also said ">20KB - may need to be divided (make sure sections are <20K - preferably much smaller)", and was clearly referring to edit byte count, because while readable prose was discussed, it was clearly delineated from the rest of the article in the sentence you quoted (it didn't say, by the way all of the numbers in this article are for readable prose, it said, in effect, oh by the way these two numbers are for readable prose, not edit byte count, like all the rest). Oakwillow (talk) 16:48, 20 May 2008 (UTC)
I provided the old edit information simply to show that the phrase has been in use for a long time. The 20-30 KB reference however has been changed in the current revision to:
"Readers may tire of reading a page much longer than about 6,000 to 10,000 words, which roughly corresponds to 30 to 50 KB of readable prose. If an article is significantly longer than that, it may benefit the reader to move some sections to other articles and replace them with summaries"
So this suggests that the STARTING POINT for considering whether subdivision MAY be appropriate is 50 KB of readable prose -- considerably higher than is being proposed but consistent with the EXISTING table. As far as the likelihood of an article being read in full, this is a choice to be made by the reader based on their own needs and interests. I would guess they are more likely to find something interesting and useful that they were not expecting in a longer article that they have on their screen in front of them, rather than if they have to switch to another screen in order to get the full availability of the information.
I am all for WP:Summary style -- I just think it should be driven by content and context rather than conjectures about attention spans.Tom (North Shoreman) (talk) 16:56, 20 May 2008 (UTC)
Actually, I would contend that saying that readers may tire from pages longer than 30 to 50 KB sets 30 KB as the upper limit - you don't want to leave anyone out, do you? So 30 KB prose becomes the ending point, not the starting point. Summary style is essential on all in depth articles - we have about 2 Megabytes in the United States article, when you count all the subarticles that stuff has been split off into. Oakwillow (talk) 17:56, 20 May 2008 (UTC)
Actually, there are plenty of people that will tire from reading 100 words of prose, so the "not leaving anyone out" argument doesn't hold. HermanHiddema (talk) 07:55, 30 May 2008 (UTC)

Just use the 80/20 rule - for 20% of the effort you get 80% of the results. Anytime you are dealing with statistics you have a bell shaped curve. By "anyone" I mean 80%. The other alternative is to use the Ivory soap rule - include 99 44/100%. Either one you choose you end up with a whole lot less than the previous guidelines. Oakwillow (talk) 06:15, 5 June 2008 (UTC)

Do you have any proof that the 80/20 rule actually supports an upper limit of 30kb? That is a rather bold assertion. HermanHiddema (talk) 09:08, 5 June 2008 (UTC)
I take it that you are not familiar with that rule? To quote from the 80/20 article, "The Pareto principle (also known as the 80-20 rule, the law of the vital few and the principle of factor sparsity) states that, for many events, 80% of the effects come from 20% of the causes." Who said that 30 was an upper limit? The article says that "30 KB [readable prose size] Probably should be divided (although the scope of a topic can sometimes justify the added reading time)" - that's not an upper limit, it's a guideline, which is a little too high for those who tire after reading 20 kB of prose - remember readers may tire of reading 20 to 30k?
Uhm, you did. Read your own comment from may 20 (two comments back), where you say "I would contend that saying that readers may tire from pages longer than 30 to 50 KB sets 30 KB as the upper limit" HermanHiddema (talk) 15:36, 5 June 2008 (UTC)
It is a limit, however this article is a guideline, and states simply that beyond 30kB "Probably should be divided", as a guideline, and not a limit, as is 50k, a guideline, not a limit. There is a difference. For some the limiting factor is their attention span of about 8 seconds. Everyone has a limit. Oakwillow (talk) 16:51, 5 June 2008 (UTC)

[edit] A rule of thumb (Proposal #3)

Some useful rules of thumb for splitting articles, and combining small pages:

Article size What to do
Edit byte count Readable prose size
> 100 KB > 50 KB Almost certainly should be divided up (there is no mandate, however; these are guidelines only)
> 60 KB > 30 KB Probably should be divided (depends on the scope of a topic and editor preferences)
> 40 KB > 20 KB May eventually need to be divided (likelihood goes up with size)
< 32 KB < 10 KB Length alone does not justify division
< 1 KB If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, why not fix it by adding more info? See Wikipedia:Stub.
Please note:

These guidelines apply somewhat less to lists or disambiguation pages, and naturally do not apply to redirects.


In this change, and I see no reason for not implementing it, 15k was reduced to 10k - in other words the lower division used 3:1 while the upper divisions used 2:1. It would still be helpful if someone would fill in the above table with readable prose sizes, by the way. Oakwillow (talk) 15:32, 5 June 2008 (UTC)

The reason for not implementing it is that people have simply not bought in to your agenda for mandating smaller articles. The majority of people who have commented have favored the status quo and tinkering with the numbers does not change the views of the majority. Tom (North Shoreman) (talk) 15:37, 5 June 2008 (UTC)
Where on earth do you get the idea that I am mandating smaller articles? This is a guideline, and even says for the upper most division "there is no mandate, however; these are guidelines only". That is there because there had been discussion of creating a "size police" to run around and chop up large articles, which was soundly rejected. The purpose, however is to prevent editors from pointing to an invalid table and saying "see this article at 93 kB readable prose is well within the WP:SIZE guidelines", totally neglecting the fact that readers may tire of only 20 kB of readable prose. Oakwillow (talk) 16:39, 5 June 2008 (UTC)
The language is contradictory. As proposed it reads, "Almost certainly should be divided up (there is no mandate, however; these are guidelines only)". Some will emphaize the first part while others the second part -- it's an edit war waiting to happen anytime someone tries to apply the guideline in a particular article. What you are in effect saying is that your numbers are "almost certainly a mandate" -- way too close to an absolute mandate.
A far as folks getting "tired", let them learn to skim. Why should we favor folks with low attention spans as opposed to those who want more information at one site? My concern is editors who have little or no knowledge or interest in the article who, nevertheless, want to chop up the article simply because it is some arbitrary size.Tom (North Shoreman) (talk) 17:13, 5 June 2008 (UTC)

Ok, what would you suggest as an improvement? As I see it, it is up to individual editors to make their own choices - it has been pointed out above that one article has gone to 122kB. I see no danger of this guideline creating or stopping edit warring. As to "let them learn to skim", it is definitely not our job to try to change the way people read. We just shouldn't be showing people a table of edit byte count and calling it a table of readable prose. Oakwillow (talk) 18:17, 5 June 2008 (UTC)

My suggestion is to drop your insistence on including edit byte count as part of the rule of thumb. It means absolutely nothing as far as readability or download speed for users on dial-up are concerned. The only thing it means is that when you select to edit the article, there is a certain amount of text in the edit window in which to edit. As an example, the Barack Obama article has 120kb of editable text and 34kb of readable text, so above the edit byte size, but well below the readable text size. The 120kb of editable text is due primarily to the text of the 177 references (35k), the templates and links to other languages (9k), section headers and related stuff (1k), images (2k), and the remainder being the wiki markup and cite templates. --Bobblehead (rants) 18:41, 5 June 2008 (UTC)
So change the table to more realistic numbers. Edit byte count, though, is just as important a measure as readable prose (and the more important of the two if you are on dialup), and is the easier of the two to find - it stares you in the face every time you click edit. What's it there for, to amuse you? No, it's there to help you, and as can be seen, there is a strong correlation between the two numbers. The BA article at 34 kb is not below the readable text size. Readers may tire of 20 kB, and for them, it is way beyond what they can read. Here, I read pretty fast, lets see how long it takes me to read the BA article, although I have to quit if it takes longer than 20 min. Oakwillow (talk) 19:20, 5 June 2008 (UTC)
Edit byte count isn't a good measure for dial up users, it's merely a remnant of the days when 32kb was the maximum size that many browsers could load that has been repurposed to mean something it shouldn't. Edit byte count may be a larger number than readable prose, but it certainly doesn't tell you if the page is going to be problematic for dialup users. This discussion page is 140k in size and I guarantee you that dial up users don't have a problem loading this page, while Barack Obama is 120k and dial up users have problem loading that one. Long story short, there isn't a meaningful number for edit byte count that can be used to determine whether or not an article should be broken up or not. Every article is going to have a different threshold of edit byte count depending on how many images they use, what kind of templates and how many are included in the article, and how many references the article has. --Bobblehead (rants) 19:59, 5 June 2008 (UTC)
Short story long, it's a usable number, which is only thrown off if there are a lot of images or any big images. This version[28] took 12 minutes to read, not counting the 32 seconds staring at a blank screen while the text downloaded (absurd, useless, pointless and stupid), although I have to say that at 6 minutes I lost interest and at 11 minutes I was just passing my eyes over the words hoping that I would finally come to the end. I would estimate that I read about twice as fast as the average reader, and at least 3 times as fast as a slow reader. The article length is absurd. Fill out the table above and we will see just how close the correlation between edit byte count and readable text really is. Oakwillow (talk) 20:16, 5 June 2008 (UTC)
There's no reason to fill out the table above. You are the only one that is insisting that edit byte size is a usable number. You're beating a dead horse here. Drop it and move on. --Bobblehead (rants) 20:19, 5 June 2008 (UTC)
On the other hand it is pointless to make false suppositions that are easily refuted or supported. The purpose of filling out the table is to make an intelligent assessment of the proper readable byte count. 50 random articles from all three groups, FA, GA, and unrated should be sufficient. Oakwillow (talk) 21:57, 5 June 2008 (UTC)

Edit byte count is one of the least useful metrics when it comes to article size. It only affects users when they edit the article, and only if they do not use the section editing feature. As such, it affects an exceedingly small percentage. For example: The Barack Obama article was edited 763 times in May, and a substantial percentage of the edits used the section editing feature. In the same period, it was viewed 770849 times. Which means that about 1 in a thousand page views is an edit, and only a part of those is affected by the edit byte count. The remaining 770000 page views are not affected by edit byte count at all. They are affected by the download size of the HTML and the size of the readable prose. HermanHiddema (talk) 20:37, 5 June 2008 (UTC)

I would suggest doing some homework before making any brash statements like that. For example, were you to fill in the table above, now that you have installed the prosesize tool, we could actually find out if there is any correlation between the two numbers. By the way, I believe that hitting show preview is going to at least double the byte count that has to be downloaded, for all but the very smallest articles. However, I agree that our priority should be on our readers, not on our editors. If each of the three of you took a third of the table I would expect you could finish in less than 5 minutes, although I see that Tom doesn't have prosesize installed (and what on earth are you even complaining about then, do you just want to put 17 million as the proposed article split size so that no one can ever look to the guidelines for assistance?). Assuming that none of you are on dialup. Oakwillow (talk) 21:20, 5 June 2008 (UTC)
Excuse me, but I did my homework above, when I showed how irrelevant edit byte count is. Why should I now do your homework when all it does is show a possible correlation between proze size and a number I have already shown to be mostly meaningless anyway? HermanHiddema (talk) 21:35, 5 June 2008 (UTC)
What on earth are you talking about? Having a linear correlation means that there is a specific ratio between the two numbers. So far it seems highly likely that this is the case, as only the BA article above has a significantly different ratio from 2.06, and is only 68% away, and if that is the case you can just throw out readable prose from the table completely. However, to make everyone happy I would suggest keeping both numbers. Oakwillow (talk) 21:50, 5 June 2008 (UTC)
There is no correlation between edit byte size and readable text/total html size. Remove everything except the images on Barack Obama and the article is less than 2kb of readable prose, but it is still over 350kb in total HTML size and still takes over 70 seconds to load on a 56k modem. Edit byte size means absolutely nothing.--Bobblehead (rants) 21:54, 5 June 2008 (UTC)

Your statement is mathematically false. No correlation would mean that for any random group of articles the ratio would be all over the place from close to zero to close to infinity. I don't think that is what you had in mind. What I am seeing is a strong correlation. Certainly there are some articles which are only a gallery of images that are the exception, we have already seen a discussion of this, for example on an international page that lists hundreds of flag icons. However, what is it like for the vast majority of articles? Correlation is a mathematical term for the goodness of fit of a straight line, and can be calculated from the data. Oakwillow (talk) 22:10, 5 June 2008 (UTC)

I am quite aware of what a correlation is, thank you. What I am saying is that edit byte count is extremely unimportant when compared to either readable prose size or total html size. As such, I see no point in including it in the table. Perhaps a note under the table to the effect of "The byte count reported at the top of the edit window is usually about 1.5 to 3.5 times that of the readable prose size" may be useful to give editors a rough way to estimate readable prose size. I've given you some numbers in your table, you can do the ratios. HermanHiddema (talk) 22:05, 5 June 2008 (UTC)
Thanks. Ratios I can calculate. The number is going to be closer to 2 to 3 though, and probably 2 will be close enough to use. The readable prose table still needs to be corrected with realistic numbers. Your numbers seem specious - they all end in an extremely unlikely three zeros. For the purpose of careful analysis it would help to indicate the version of the article used and the exact byte count. Oakwillow (talk) 22:19, 5 June 2008 (UTC)
The prose size tool reports the prose size in KB, so more precise number are not available with that tool. I do not think that the extra precision is very significant anyway. HermanHiddema (talk) 11:39, 6 June 2008 (UTC)
I don't understand this rationale for edit byte count as the easiest metric to use. Once the prosesize tool is installed, you get the readable prose size count with one click. You also get the readable prose word count, and word count is the size metric most familiar to those who have done writing in other fields. Wasted Time R (talk) 22:37, 5 June 2008 (UTC)
Not everyone has prosesize installed, but everyone sees the byte count when they either click edit (over 32kB) or click history. I have no objection to a greater emphasis on word count than prose byte count. It would certainly help reduce the confusion. I assume that is actual words, not typing words (characters divided by five)? So far I'm seeing a 97% correlation between edit byte count and readable prose characters, which is what I would call very high. Oakwillow (talk) 22:50, 5 June 2008 (UTC)
Lets keep in mind that correlation does not imply linearity (see also Correlation#Common_misconceptions_about_correlation). HermanHiddema (talk) 12:11, 6 June 2008 (UTC)

Hello... Speaking of correlations you guys might consider a checkuser on this page because this discussion has implications for all of wikipedia and I think there's a good chance it has been compromised. My favorite statement in the above discussion was, "My concern is making it too easy to justify slicing up appropriately long articles (because of the nature of the material) by editors with little grasp or interest in the underlying subject." Amen... On the mischievous side of things there are people who revel in disrupting good articles. Don't give them any more tools. Strengthen the protection of good content if anything. I'm an advocate of the readable prose guideline. Technical or biographical articles can easily have half or more of the edit byte count in refs, lists, tables, pictures etc. The current guidelines are SAT. Mrshaba (talk) 17:48, 6 June 2008 (UTC)

[edit] A rule of thumb (Best fit)

Some useful rules of thumb for splitting articles, and combining small pages:

Article size What to do
Edit byte count Readable prose size
> 100 KB > 45 KB Almost certainly should be divided up (there is no mandate, however; these are guidelines only)
> 60 KB > 30 KB Probably should be divided (depends on the scope of a topic and editor preferences)
> 40 KB > 20 KB May eventually need to be divided (likelihood goes up with size)
< 32 KB < 15 KB Length alone does not justify division
< 1 KB If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, why not fix it by adding more info? See Wikipedia:Stub.
Please note:

These guidelines apply somewhat less to lists or disambiguation pages, and naturally do not apply to redirects.


The above is a best fit using a power function to convert from edit byte count to readable prose size. Oakwillow (talk) 22:55, 5 June 2008 (UTC)

Sigh. You are making the assumption, again, that the current numbers in the table refer to edit byte count, but they do not. As such, the above table is utter nonsense and you should move the numbers from the edit byte count column to the readable prose size column, then use your power function to fill the edit byte count column with the appropriate larger numbers. HermanHiddema (talk) 07:30, 6 June 2008 (UTC)
Just to support HermanHiddema here.. The readable text levels are completely ridiculous.The current rule of thumb in the guideline is perfectly acceptable and you are the only one arguing for its replacement, Oakwillow, which should pretty much tell you that consensus is against you. Drop it and move on. You're wasting your time and the time of every editor that comes by here and feels the need to respond to your incessant demands for change to this guideline. --Bobblehead (rants) 19:00, 6 June 2008 (UTC)
Wrong. I am pointing out an obvious error which needs to be corrected. A table that was established using edit byte counts has been mislabeled readable prose. What are the facts that we know? 1) that readers may tire from reading 20-30 kB (and by extension anything greater, such as 30-50, or 10-17 million), 2) that using the erroneous numbers leads to articles twice as long and frequent complaints about article size, and 3) that the edit byte count is more accessible and is commonly confused with readable character count, although there is typically a 2:1 ratio between the two. Therefore, the table needs to be fixed. It is simply wrong. The three ways it could be fixed are choices A, B, or C above. Choose one and move on. We can also add 4) that most people in other writing areas use word count and not character count, so that is what we should also use. By the way, it is a little amusing that the only editors participating in this discussion are P.E.'s (Primary Editors of the article they have edited the most) of horrendously long articles (over 80 kB edit byte count, although one is right at 80 kB). WTR of course sets the record, for being the P.E. of HRC (Hillary Clinton) at 158 kB edit byte count. Trust me I couldn't care less how long you make your articles, just when someone complains that it is too long don't point to a bogus number and say it is within guidelines (HRC in the FAQ appropriately says - Q: This article is long! A: Yes.). Oakwillow (talk) 22:32, 6 June 2008 (UTC)
Please refrain from personal attacks. HermanHiddema (talk) 21:25, 8 June 2008 (UTC)
Not intended. It was simply an observation. Most articles are much smaller. Oakwillow (talk) 00:55, 9 June 2008 (UTC)
The table was established using source size at a time when the difference was rarely of significance. The intent is to refer to the size of the readable text as reflected on the page. Christopher Parham (talk) 23:42, 8 June 2008 (UTC)
Not a problem, however, there is a 2 to 1 difference in the numbers, so they should be adjusted accordingly. It would probably make sense though to just switch to word count. Oakwillow (talk) 00:55, 9 June 2008 (UTC)
There is no need to adjust anything. The existing numbers have always referred to readable prose size, as they do today, and there is no need to change them. The only necessary adjustment has already been made long ago, by ending the use of source size as a good analogue to prose size, which it no longer is. Hence your proposals are unnecessary, which explains the lack of support for them. Christopher Parham (talk) 04:05, 9 June 2008 (UTC)

[edit] Proposal #4 (Use word count instead of readable character count)

Some useful rules of thumb for splitting articles, and combining small pages:

Article size What to do
Edit byte count Word count
> 100 KB > 10,000 Almost certainly should be divided up (there is no mandate, however; these are guidelines only)
> 60 KB > 5,000 Probably should be divided (depends on the scope of a topic and editor preferences)
> 40 KB > 3,000 May eventually need to be divided (likelihood goes up with size)
< 32 KB < 2,500 Length alone does not justify division
< 1 KB If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, why not fix it by adding more info? See Wikipedia:Stub.
Please note:

These guidelines apply somewhat less to lists or disambiguation pages, and naturally do not apply to redirects.


This proposal is to switch from readable prose which inherently creates confusion between edit byte count and readable character count, to word count, which is more normally used as a measure of how much to write about a subject. Much of the rest of the guideline would also need to be changed slightly to reflect this emphasis. Oakwillow (talk) 18:18, 8 June 2008 (UTC)

Sigh. This table makes the same mistake that all your other proposals have made. It falsely assumes that the current numbers in the table refer to edit byte count instead of readable prose. HermanHiddema (talk) 21:24, 8 June 2008 (UTC)
I see no indication that that is a false assumption. I would, however like to hear what others have to add. Whatever numbers are used, if they are too large there will be more complaints about size, if they are too small, the articles will simply ignore them. Right now what I am seeing is complaints about size. As a percentage of articles, there are very few that are ignoring the above suggestion, of staying at less than 100 kB edit byte count, zero of the 38 random articles above (the only ones larger are the added country and campaign articles). Of all 1721 FA articles at User:Dr pda/Featured article statistics, only 14, or less than 1%, are greater than 10,000 words, so I would posit that that is a good number to use. None are greater than 15,000 words, so that or 20,000 would be a useless number to use (it would be like setting a 500 kph speed limit on a highway; while every kid would love it, it would serve no purpose). Oakwillow (talk) 01:18, 9 June 2008 (UTC)

Other editors have, so far, agreed with my assertion that your assumption is false. See eg the comment by Tom (North Shoreman) at 16:14, 20 May 2008 (UTC) and the comment by Christopher Parham at 04:05, 9 June 2008 (UTC). This is not a popular vote, of course, but apparently other editors have found my arguments convincing or have reached the same conclusion separately. I will try to explain how I think the current numbers came about:

At one time, in 2003 and before, there was a hard limit of 32 KB on "edit byte count", because some browsers has issues with editing texts larger than that. This was, of course, purely a technical issue. The number 32KB was not based on considerations of readability or the like. By 2004, the number of browsers in that still had this issue had become quite rare. This version from march 2004 links to a page of browsers that have the issue and how to upgrade them. It explicitly mentions that section editing for logged in users exists, and mostly invalidates the technical 32KB limitation. By the end of 2004 the section editing feature was available to all users, whether logged in or not, further invalidating the 32KB limit. As 32KB was no longer a technical limitation, but there was still a desire to say something about limiting article size, the concept of "readable prose" was introduced to the page. The version above includes the text "Readers may also tire of reading a page in excess of 20-30 KB of readable prose (tables, lists and markup excluded)". This 20-30KB seems, initially, to have been a convenient number because it resulted in limits similar to the outdated 32KB technical limit. As time move on however the numbers for "readable prose" were adjusted upward, after discussion, to "30-50 KB", which was apparently felt more in line with actual limits on attention span. The text, and the "rule of thumb" table were updated accordingly. And as "readable prose" has been the most important measure since 2004, the table has referred to that since that time. What has happened since is that wikipedia became stricter in its referencing policy, causing "edit byte count" and "readable prose" to drift apart. So numbers could at that time have referred to either "readable prose" or "edit byte count" without any significant difference, but this is no longer the case.

Now personally, I feel that the current numbers in the table are somewhat too high, and could benefit from being adjusted downward. But to do that, we should simply examine what reasonable limits on "readable prose" are. To assume that the current numbers refer to "edit byte count" and that they are set in stone and can only be adjusted downward by dividing them by some 2.06 correlation number is counter-productive. HermanHiddema (talk) 08:14, 9 June 2008 (UTC)

Edit byte count=crap. For the exact reasons described above by HermanHiddema and multiple times by myself. Personally, before I'll seriously consider any modification to the rule of thumb table, the use of edit byte count must be removed. I'm open to reducing the readable text size, but not by much. Certainly not to 50k being the "Almost certainly ..." level. Realistically, the only issue I see with the current readable prose limits is that the 100k is too high for the "Almost certainly..." level. Drop it down to 80k and make the divisions a little clearer and I think the table would be fine:
Readable prose size What to do
80+ KB Almost certainly should be divided
60-80 KB Probably should be divided (although the scope of a topic can sometimes justify the added reading time)
40-60 KB May eventually need to be divided (likelihood goes up with size)
1-40 KB Length alone does not justify division
< 1 KB If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, why not fix it by adding more info? See Wikipedia:Stub.

--Bobblehead (rants) 20:20, 9 June 2008 (UTC)

Good, now we're getting somewhere. This seems like a reasonable proposal. I do think that using word count might be a good idea as well, given that that measure is familiar to writers in many fields, as Wasted Time R mentions above. The table above show a ration of slightly over 6 bytes per word, so how about something like:

Word count (KB of Readable Prose)  What to do
more than 12,000 words (> 75 KB) Almost certainly should be divided
9,000 – 12,000 words (55-75 KB) Probably should be divided (although the scope of a topic can sometimes justify the added reading time)
6,000 – 9,000 words (40-55 KB) May eventually need to be divided (likelihood goes up with size)
less than 6,000 words (< 40 KB Length alone does not justify division
less than 100 words (< 1 KB) If an article or list has remained this size for over a couple of months, consider combining it with a related page. Alternatively, why not fix it by adding more info? See Wikipedia:Stub.

-- HermanHiddema (talk) 14:17, 11 June 2008 (UTC)

[edit] Off topic

The following has been moved to User talk:Oakwillow. Please discuss there, and not here. This section can be deleted as it is a duplicate. Oakwillow (talk) 01:27, 7 June 2008 (UTC)

Oakwillow is probably 199.125.109.xxx... I'd put money on it. I'm here because this Anon is demanding major trimming on a reasonable article. I left a note on HH's page but if there are other editors that have been dealing with 199 or Oakwillow we'd do well to work collectively. I didn't realize this editor has been using socks but this seems to be the case. What a pain!!! Itsmejudith has suggested an Arbcom and given me a name. I really don't know how to approach this but I'm fed up with a year's worth of hassling over miscellany. A censure of some sort seems to be in order. Mrshaba (talk) 23:10, 6 June 2008 (UTC)

I left a message on Mrshaba's talk page, but just because Oakwillow and the IP address have the same user, it doesn't mean they are at odds with WP:SOCK. He'd only be at odds with the policy if he used the multiple accounts to make it appear that there is more support for something or to circumvent another policy. Unless some evidence is presented where Oakwillow was using the IP address disruptively in conjunction with the Oakwillow account, then it's not really something that can be check usered or at odds with WP:SOCK. --Bobblehead (rants) 23:52, 6 June 2008 (UTC)
Or unless Oakwillow and the IP are banned Sadi Carnot (talk · contribs) of the infamous Extra-Long Article Committee, using socks to evade a ban. Where is this IP that has been mentioned? SandyGeorgia (Talk) 23:58, 6 June 2008 (UTC)
New Hampshire. Mrshaba (talk) 00:05, 7 June 2008 (UTC)
I meant where has the IP been posting; I don't see this IP anywhere. SandyGeorgia (Talk) 00:06, 7 June 2008 (UTC)
I looked at Sadi Carnot's history. There's definitely a similarity. I'm familiar with 199's writing style so I'll look into Sadi's talk posts more. 199 is a dynamic address but the user frequents energy stuff... Solar energy, Nuclear power, Hydrogen economy, Photovoltaics, Electric car etc. Although less themed 199 edits Cannibis (drug), E-mail spam, Federal Assault Weapons Ban, Contra Dance and articles involving page name disputes. Mrshaba (talk) 00:17, 7 June 2008 (UTC)
Yes, I just found it as well: [29] Checkuser time; this has been my suspicion all along here. SandyGeorgia (Talk) 00:20, 7 June 2008 (UTC)
Unless Sadi Carnot has had a recent sockpuppet discovered or had a prior RFCU run against them where the CU kept notes, then there really isn't a way to show that Sadi is Oakwillow/IP address are the same via RFCU. RFCU only works on edits within the last month or so. However, if there is suspicion of them being the same, WP:SSP may be the direction to go as that is based on behavioral evidence rather than them sharing the same IP address/range. --Bobblehead (rants) 00:38, 7 June 2008 (UTC)
Well, if there's a sockpuppet template at User:Sadi Carnot, doesn't that mean there was likely a checkuser? Where does one look for it ? SandyGeorgia (Talk) 00:55, 7 June 2008 (UTC)
Well, Wikipedia:Requests for checkuser/Case/Sadi Carnot doesn't exist. So no official one was done. --Bobblehead (rants) 00:59, 7 June 2008 (UTC)

Could someone please explain WTF this has to do with this article? I'm moving it to my talk page. Please continue it there. Oakwillow (talk) 00:52, 7 June 2008 (UTC)

[edit] download timings for large articles

Here are my timings using a fast PC with 2464 kilobits per second ADSL broadband in Europe. I first used Web Page Analyzer but I suspect its time model does not reflect actual browsers (too pessimistic), and I also tried OctaGate SiteTimer whose model appears too optimistic and sometimes fails to complete. I have now installed Firebug for Firefox and I am using its Network monitoring tool.

I clear the cache before each timing and nothing else is using the network (except some 256 byte packet every 5 seconds), and in any case the network never approached even 40% usage. All figures are seconds. All accesses made as an anonymous IP, unless "as user" is shown (which was noticeably slower).

Web page                      Manual timing #  Firebug            OctaGate ## 
----------------------------------------------------------------------------
de:Barack_Obama               4 then 3         1.77 then 3.46     1.95 & 2.2   
en:Barack_Obama               6 then 4-9       5.45 then 3.62     1.4  & 5.0
en:Barack_Obama May 18        7 then 5         6.41 then 4.42
en:Barack_Obama as user       9 then 5-9       8.8  then 5.78   
fr:Barack_Obama               2 then 2         1.78 then 2.04     1.35 & 1.35

en:Hillary campaign           6 then 5         5.44 then 5.12     OctaGate fails
en:Hillary campaign repeat    13 then 6        8.54 then 5.42
en:Russia                     20 then 8        15.7 then 8.4
de:Lisa del Giocondo          3 then 2         1.49 then 2.19     0.95 & 1.2
en:Lisa del Giocondo          3 then 2         2.41 then 1.38     1.2  & 1.2
en:Lisa del Giocondo as user  3 then 3         3.71 then 3.14
fr:Lisa del Giocondo          8 then 5         7.3  then 3.18     1.4  & 1.7
citizendium Barack_Obama      3 then 5 !       2.14 - 7.21
                                               then 4.97 - 4.55

info.britannica.co.uk         4 then 3         2.42 then 1.79     1.5 & 2.0
  • "4 then 3" means 4 seconds to load with empty cache, then 3 seconds to reload (with objects already in cache).
  • # Manual Timing - first figure is from empty cache (or freshly sandboxed Firefox), "then" the time to reload. n-m indicates I encountered a range of times, "!" means I was surprised by reload taking longer than fresh load
  • ## OctaGate - first figure is time to download the html,css and javascript, and the second is after all images are downloaded
  • Lisa del Giocondo is a small featured article in english wikipedia, and redirects to La Joconde in the French, but is a mere stub in the German article.
  • citizendium's Barack Obama article is 56504 bytes, with 16641 characters readable prose.
  • Opera was considerably faster than Firefox: it took only 3 seconds to fully render Barack_Obama, and 1 second if images were disabled —Preceding unsigned comment added by 84.223.78.86 (talk) 00:20, 15 May 2008 (UTC)

Oops, I forgot to sign the above - I, User:84user, am the same as 84.223.78.86. -84user (talk) 00:35, 15 May 2008 (UTC)

  • Note Firefox loads Russia in 20 seconds while Opera takes 18 seconds.

(To the table I added Russia timings, and noted Barack Obama now takes 6.4 seconds)-84user (talk) 17:46, 18 May 2008 (UTC)

  • I'm on dial-up. My timings are probably 30 times slower. In minutes instead of seconds. Oakwillow (talk) 07:57, 20 May 2008 (UTC)

[edit] GFDL compliance

I have added the following to the spinout section:

To conform with §4(I) of the GFDL, the new page should be created with an edit summary noting "split content from [[article name]]". (Do not omit this step or omit the page name.) A note should also be made in the edit summary of the source article, "split content to [[article name]]", to protect against the article subsequently being deleted and the history of the new page eradicated.

I believe this is necessary to secure the GFDL rights of content creators, since Wikipedians do not release their material into fair use but retain authorship credit. This is based on the language at Help:Merging and moving pages, where similar issues of separating text from contribution history exist. The link back to the source is essential at the new article, but it is also important to note the separation at the old article to help guard against history deletion. --Moonriddengirl (talk) 12:38, 2 June 2008 (UTC)