User:Opabinia regalis/Article statistics
From Wikipedia, the free encyclopedia
[edit] Random article survey
I was bored waiting for my very slow program to run, so I clicked "random article" 250 times and kept track of what kinds of articles popped up. 48 articles (19.2%) were stubs or had at least one cleanup tag. (I tried to count "citation needed" as a cleanup tag but may have missed a few.) The results as of 11 Nov 2006:
Type of article | Number | Percent of sample |
---|---|---|
Biography | 60 | 24% |
Places/geographical locations | 34 | 13.6% |
TV shows/movies | 17 | 6.8% |
Disambiguation | 15 | 6% |
Music/bands/albums | 14 | 5.6% |
Company/product/service | 13 | 5.2% |
History/war | 12 | 4.8% |
Politics/government | 9 | 3.6% |
Sports | 8 | 3.2% |
Organisms | 8 | 3.2% |
Definitions/common phrases/common objects | 7 | 2.8% |
Architecture/buildings | 7 | 2.8% |
Mythology/religion | 5 | 2% |
Astronomy/physics/space science | 5 | 2% |
Software/computing | 5 | 2% |
Games (including video) | 4 | 1.6% |
Literature/publications | 4 | 1.6% |
Biology/medicine | 3 | 1.2% |
Food/drink | 3 | 1.2% |
Schools | 3 | 1.2% |
Math | 2 | 0.8% |
Nonsense/unclassifiable | 2 | 0.8% |
Visual arts | 2 | 0.8% |
Philosophy/ethics | 2 | 0.8% |
Linguistics/languages | 2 | 0.8% |
Charities/nonprofit organizations | 2 | 0.8% |
Economics/finance | 1 | 0.4% |
Deleted and protected | 1 | 0.4% |
"Biography" is probably a bit overinflated because I classified everything about an individual real person as a biography, including historical figures. Articles about fictional characters went in the category of the corresponding fiction (TV, myth, etc.)
Obviously this is a lousy way to determine Wikipedia coverage - 250 articles is a tiny sample. But the advantage over, say, counting category populations is that this avoids duplicate-counting of articles in multiiple categories and can find articles that are un- or miscategorized. Special:Random also (as far as I know) excludes recently created articles that haven't yet been indexed, which filters out lots of nonsense speedy candidates. I don't think Special:Random would exclude deletion candidates, but none of these had prod or AfD templates.
First-glance observations:
- I didn't find a single chemistry article. Biology and medicine had one clinical feature, one cell biology article, and one disease, so not even any biochemistry showed up. Physics as such was also missing; the articles in that category were almost entirely about NASA missions and observations.
- Similarly, nothing I'd classify as sociology or psychology.
- The literature and publications category contains a comic book, a newspaper, and two contemporary novels. No classic/canon literature.
- I admit I'm a bit surprised at the low volume of school articles, which judging from AfD are infesting the place like weeds.
- Somehow I don't think that 14% of the sum of all human knowledge is TV, movies, games, and bands. I admit I was surprised at the low percentage of video game cruft. The music articles were almost exclusively contemporary popular bands or their albums (with reasonably diverse geographical coverage) - nothing about musical theory and nothing about classical music.
[edit] Recent mainspace changes survey
Inspired by Wikipedia:Wikipedia is failing and User:Worldtraveller/Wikipedia is failing (NB: leaving the redlink, in case further moves occur), I looked at a sample of 250 mainspace edits covering a time span of 04:43 to 04:46 UTC on 18 Feb 2007. (It would be interesting to gather these statistics again at a time when US schools are in session.) In this sample there were 159 edits by registered users, 89 edits by anonymous users, and 2 edits to a subsequently deleted image description page. Thus the percentages below take 248 edits as the total sample.
Change type | Percent of total sample (n = 248) | Percent by registered editors (n = 248) | Percent by anonymous editors (n = 248) | Percent of all registered edits (n = 159) | Percent of all anonymous edits (n = 89) |
---|---|---|---|---|---|
Substantial content changes | 5.2% | 4.0% | 1.2% | 6.3% | 3.4% |
Minor content changes | 28.6% | 17.3% | 11.3% | 27.0% | 31.5% |
Copyediting/formatting/wikilinking | 40.7% | 27.4% | 13.3% | 42.8% | 37.1% |
Tagging/maintenance | 8.5% | 6.5% | 2.0% | 10.1% | 5.6% |
Vandalism reversion | 8.9% | 7.3% | 1.6% | 11.3% | 4.5% |
Vandalism | 8.1% | 1.6% | 6.5% | 2.5% | 18.0% |
Other than determining whether an edit was vandalism, I did not make any value judgments. Thus, 'minor content changes' contains considerable amounts of unsourced material and original research that will certainly be reverted.
Other observations:
- I saw two ongoing edit wars and one addition of an inappropriate unfree image.
- Of the ten examples of substantial content changes by registered users, five were new-page creations. The single largest content change was on a Digimon article.
- One of the four examples of vandalism by registered users involved the creation of a nonsense page.
- I excluded bot-flagged edits (the default). The registered-editor set contains two edits by a known-bot account without a bot flag.
- The percentage of copyediting and formatting done by registered editors is probably inflated by AWB users.
General thoughts:
- I suppose it's a good sign that the rate of vandalism and the rate of vandalism reversion are about the same. However, that could be a function of the time of day.
- Substantial content addition occurs at a quite low rate. It's possible that this is due to editing patterns: if an editor uses many 'progressive saves', no one change will appear on this sort of survey as substantial, and if an editor uses a single save for a large change, that editor's edit rate will be low and his change will be unlikely to appear in such a small sample. I didn't see much evidence of the first pattern, in that no series of edits to the same article by the same person occurred except to manipulate formatting; however, a series of content-creating edits will likely be separated by more than three minutes.