Wikipedia:Does Wikipedia traffic obey Zipf's law?
From Wikipedia, the free encyclopedia
If accesses to Wikipedia's article pages obey Zipf's law, we can expect a roughly linear relationship between log(hits) and log(hit rank) for Wikipedia pages. (Note: the hit data in the graph has been scaled in such a way that 10000 hits are equivalent to 1% of the total access rate.)
This appears to be the case in practice for pages with rank between 5 and 1000, based on data from WikiCharts, as of September 2006.
The five most popular pages deviate significantly from the straight-line curve, but the approximation is pretty accurate from then on. The slope of this part of the log-log graph is approximately 1/2, suggesting that the hit rate is inversely proportional to the square root of the page rank,
- thus
- or,
Note: These scaled hit rates are derived from actual hit data counts over a particular period, and thus reflect actual hit counts for a statistical sample of user hits over that period, rather than statistical estimates of a theoretical underlying constant hit rate from those hit counts. The error bars in the WikiCharts data apply to the hit rates as an estimator of an underlying hit rate, and do not apply here.
[edit] How much traffic does the least popular page get?
Although this data does not directly tell us anything about the traffic of pages other than the most popular 1000, if we assume that Zipf's law continues to hold for the remaining 1.5 million (as of 2006) Wikipedia article pages, we can extrapolate the traffic expected for less-popular pages, and in particular the least popular page, at rank 1.3 million.
Compared to the page with rank 6, which is probably the first point that fits the trend, this suggests that the least popular Wikipedia article might get times as much traffic.
Given that the actual unscaled hit rate of the page with rank six is about 40,000 hits per day, that suggests that the least popular page will get about 80 hits per day.