User:Dantheox/Stub percentages
From Wikipedia, the free encyclopedia
With Wikipedia crossing one million articles in early 2006, I asked a simple question: what proportion of those articles are stubs? I couldn't find any page on Wikipedia that included this information, so I tallied some statistics myself. The following information is based on the December 13, 2005 database dump. I'll try to update it when Wikipedia hits a million articles.
Note that for these statistics, I only considered an article a stub if it contained the text 'stub}}' in a context other than 'section-stub}}'. Many extremely complete articles (e.g. George W. Bush) have section stubs, but I wouldn't call the articles stubs.
Stubs were introduced en masse around the start of 2004. Since then, they have formed an increasingly large proportion of Wikipedia articles.
Though the rate of increase has lessened since 2004, stubs still comprise an increasingly large percentage of the articles on Wikipedia. The fact that over 35% of articles are stubs reinforces the fact that Wikipedia is growing extremely rapidly, and emphasizes the need to focus on stub expansion, not just article creation.
[edit] Retrostubs
The following charts are based on the assumption that, if an article is eventually labelled a stub, then it was probably a stub before that, too. I call an article which is not currently a stub, but will eventually become one, a retrostub. Retrostubs are a relatively simple way to put lower bounds on the number of stub articles in the past, before there was a stub labelling system.
The dip in late 2002 is due to User:Rambot adding lots of U.S. county and city articles. Retrostubs are at best a lower bound on the number of stubs at any given time, since an article could have gone from stub to nonstub before the stub system was ever developed. Hence only articles which were still stubs in 2003/2004 are likely to be retrostubs in 2001/2002. I can't think of any way to reconstruct the stub data, other than making an arbitrary cutoff like "500 bytes or less is a stub." The retrostub percentage chart, however, does show that it's unlikely that the percentage of stub articles on Wikipedia has been increasing anywhere near as quickly as the original chart made it look, if it's been increasing at all. The stub percentage has probably been at least where it is now for the past two years.
Feel free to contact me for the raw data or scripts used to compute these statistics.