Wikipedia talk:Wikipediology/library/essays/R.fiend-1

From Wikipedia, the free encyclopedia

< Wikipedia talk:Wikipediology | library | essays

Great essay. I'd been wondering about doing this kind of big sample random articles test - but I'm also happy that someone else did the work and I can just enjoy the result :)

I agree with your carefully considered conclusions. Sure, we have a tremendous coverage of popular culture - but we've got everything else down pretty well too. The same can be seen from Wikipedia:WikiProject_Missing_encyclopedia_articles - we're well on our way towards covering every subject traditional encyclopedias have got.

The quality issue is, of course, a much more difficult question - but the coverage issue is hardly even debatable anymore.

Cheers! - Haukur Þorgeirsson 17:23, 11 November 2005 (UTC)

[edit] Other studies

The Wikipedia Signpost had an article this week on such evaluations.

There was also a bot at one time that used a machine-learning algorithm to detect articles that were labelled stubs that were not actually likely to be stubs: I don't know what all it considered, but it included things like article length and content. Unfortunately I can't remember who ran it. It may not have even been a bot; it may have just been a program to run on a database dump and identify articles for a human to look at and fix. Anyway, I thought you might find it interesting. If you happen to know or figure out who did this, please let me know so I can remember. Jdavidb (talk • contribs) 17:27, 15 November 2005 (UTC)

Lo and behold, I found it! Jdavidb (talk • contribs) 00:29, 16 November 2005 (UTC)

Much smaller, but I did a 10 page test, which you all might find interesting. JesseW, the juggling janitor 06:26, 31 January 2006 (UTC)

And User:Tony Sidaway did a 10 page check in late Jan 2006, on wikien-l. JesseW, the juggling janitor 08:34, 1 February 2006 (UTC)

And User:Ambi did The 10 Random Pages Test in September 2004, and again in January 2006. JesseW, the juggling janitor 07:39, 6 February 2006 (UTC)

Also, there is a category: Category:Random_Pages_Tests, that has 20 pages in it... JesseW, the juggling janitor 07:39, 6 February 2006 (UTC)

[edit] Statistical analysis

Here's the table in the article, but with the half-widths of a 95% confidence interval. In English, that means that there's a 95% chance that the true number of a given type is within "+/- #" from "Estimated # in WP". For example, there's a 95% chance that the true number of full articles is less than 33600 from 342,768. Put another way, the correct number of full articles was between 309,200 and 376,400. Similarly, the percentage of full articles is somewhere between 40% and 48.8%. This method of analysis only works if the sample percentage is greater than about 1%, so I didn't do the calculations for two of the article types, since the numbers wouldn't be particularly valuable.

Type of article Estimated % of WP Estimated # in WP +/- % +/- #
Full articles 44.4% 342,768 4.35% 33600
Stubs 17.8% 137,416 3.35% 25900
Substubs 10.6% 81,832 2.70% 20800
Disambiguation 4.6% 35,512 1.8% 13900
Articles from public sources 1.4% 10,808 1.03% 7950
Rambot articles 7% 54,040 2.24% 17300
Charts 6.8% 52,596 2.21% 17100
Lists 2.2% 16,984 1.29% 10000
Requiring substantial cleanup 2% 15,440 1.23% 9500
Should be redirected 0.6% 4,632
Dubious articles 1% 7,720
Deletables 1.6% 12,352 1.10% 8500

--Spangineeres (háblame) 02:12, 8 December 2005 (UTC)


[edit] Randomness

Incidentally, you guys know that randompage isn't really random, right? There are some biases built in, or so Brion told me once. --maru (talk) contribs 04:06, 15 February 2006 (UTC)