User talk:Dr pda/prosesize.js

From Wikipedia, the free encyclopedia

This script adds a Page size link to the toolbox, i.e. the box in the left hand column (by default) which also contains What links here (among other things). Clicking on this link displays some statistics about the page and prose size (see below), and highlights the 'readable prose'. Clicking the link again turns these off.

Contents

[edit] How to get it working

The script needs the addLink function to create this link.

If you already have this in your monobook.js (i.e User:your_user_name/monobook.js), then just add {{subst:js|User:Dr_pda/prosesize.js}} to your monobook.js, and save it.

If you are not already using addLink, then add

{{subst:js|User:Omegatron/monobook.js/addlink.js}}
{{subst:js|User:Dr_pda/prosesize.js}}

to your monobook.js, and save it.

After saving, you have to bypass your browser's cache to see the changes. Mozilla/Safari: hold down Shift while clicking Reload (or press Ctrl-Shift-R), Internet Explorer: press Ctrl-F5, Opera/Konqueror: press F5.

[edit] Sample output

Document statistics:

  • File size: 89 kB
  • Prose size (HTML): 28 kB
  • References (HTML): 10 kB
  • Wiki text: 31.8 kB (4799 words)
  • Prose size (text only): 18 kB (3310 words)
  • References (text only): 4 kB
  • Images: 443 kB

[edit] Quick summary

  • File size: size of HTML document
  • Prose size (HTML): size of HTML within <p></p> tags
  • References (HTML): size of HTML for cite.php references
  • Wiki text: size of text+markup within the edit box
  • Prose size (text only): size of text within <p></p> tags
  • References (text only): size of text for cite.php references
  • Images: size of image thumbnails (Internet Explorer only)

[edit] File size

This is the total size of the HTML document. If you went to View->Page Source (or the equivalent) in your browser, and saved the resulting output to your computer, the file size would be the size of this file. This number does not include any images. The file size (plus the image size) is what you need to look at when considering how long a page will take to load.

For Internet Explorer this number is obtained from the document.fileSize property. For other browsers it is obtained by loading the page again with an XMLHttpRequest, so this number may take a few seconds to appear.

[edit] Prose size

Wikipedia:Article size says

there [are] stylistic reasons why the main body of an article should not be unreasonably long, including readability issues ... For stylistic purposes, only the main body of prose (excluding links, see also, reference and footnote sections, and lists/tables) should be counted toward an article's total size, since the point is to limit the size of the main body of prose.

One of the main motivations for this script was to provide a convenient way of calculating the prose size. The technique used is to just count the text within <p></p> tags in the HTML source of the document, which corresponds almost exactly to the definition of 'readable prose'. This method is not perfect however and may include text which isn't prose (eg in navboxes), or exclude text which is (eg in {{cquote}}, or prose written in bullet-point form, eg Anarchism#Recent developments within Anarchism). The text counted as prose is highlighted in yellow, so it is easy to see whether the prose size is over or underestimated.

Two numbers are given for the prose size: HTML and text only. The HTML size is the size of the HTML code contained within <p></p> tags. This number can be compared to the file size to see how much of the document consists of readable prose. The text-only size is the size of just the words, without any formatting. (This is what you would get if you copied and pasted the prose from the article into something like notepad, which strips out all the formatting). The word count is self-explanatory, and is calculated from the number of spaces in the text-only prose. Note that Internet Explorer highlights the section headings, but does not count them as prose. (This is because there is an 'invisible' <p></p> before them containing a link so that you jump to the right place when you click the appropriate section in the table of contents.)

[edit] References size

Now that cite.php inline citations are becoming very common, it is often useful to know how much of the article size comes from these references. The HTML references size is the size of what is produced by the <references/> tag, plus the size of the HTML to produce the markers (i.e. [1]). The text-only size is again just the text of the references, plus the text of the markers. Note that the contribution of the markers is explicitly subtracted from both prose size numbers. The markers also should not affect the word count, since there should be no spaces between them and the preceding word/punctuation.

[edit] Wiki text size

In addition to the above numbers, which are calculated from the HTML source of the page, there is also the size which is returned by the Wikipedia search. If you type an article title in the search box and click Search (instead of pressing enter or clicking Go), you get a list of articles matching the search criteria, each of which has a line like

Relevance: 96.2% - 31.8 kB (4799 words) - 09:27, 24 December 2006

This size/word count refers to the text plus wiki markup which appears in the edit box when you edit a page. This is also the same number which appears in warnings about page length (e.g. Note: This page is 37 kilobytes long.). The prose size script performs the search automatically to get these numbers for the current article. This involves another XMLHttpRequest, so it may take a few seconds for these numbers to appear; if there is a problem with the search, the script will not be able to show this value and will pop up a message box.

Caveats:

  • Because it often takes weeks for the search index to be updated these numbers may not refer to the current version of the article.
  • If you get to a page from a redirect, the wiki text numbers will be those of the redirect page (typically 0 kB and a few words)

[edit] Images size

N.B. This only works in Internet Explorer (or browsers supporting the element.fileSize parameter).

This number is the total size of the image thumbnails, i.e. the size of the images which actually appear on the page, not the full size versions they link to. The total number/size of images affects how long the page takes to load, although the text of the page is loaded first and hence readable while the images are still loading. It is also possible to turn off images to speed up loading of the page. Note that the script only counts images within the article (i.e. not the WP logo, skin background, etc). It also currently counts every occurrence of a repeated image, whereas the browser only needs to download it once (this would have an effect on pages with many flags denoting nationality for example).