User:ChaTo/Temporal Evolution of the Wikigraph

From Wikipedia, the free encyclopedia

This paper was accepted for publication in the Web Intelligence Conference 2006 in Hong Kong:

Title: Temporal Analysis of the Wikigraph (530 KB .pdf) [This is the author's version posted on my personal Website by the standard IEEE rules, not to be re-posted]

Authors: Luciana S. Buriol, Carlos Castillo, Debora Donato, Stefano Leonardi, Stefano Millozzi

Wikipedia (www.wikipedia.org) is an online encyclopedia, available in more than 100 languages and comprising over 1 million articles in its English version. If we consider each Wikipedia article as a node and each hyperlink between articles as an arc we have a Wikigraph , a graph that represents the link structure of Wikipedia.

The Wikigraph differs from other Web graphs studied in the literature by the fact that there are timestamps associated with each node. The timestamps indicate the creation and update dates of each page, and this allows us to do a detailed analysis of the Wikipedia evolution over time.

In the first part of this study we characterize this evolution in terms of users, editions and articles; in the second part, we depict the temporal evolution of several topological properties of the Wikigraph. The insights obtained from the Wikigraphs can be applied to large Web graphs from which the temporal data is usually not available.

[edit] Summary of conclusions

The conclusions of the paper point to both signs of growth and maturity in the current Wikipedia:

Signs of transient regime (growth):

  1. The number of articles, updates, visitors and editors is still growing exponentially.
  2. The size of articles is still growing linearly.
  3. The number of links per article is also growing linearly, but slower than the amount of text.
  4. The number of reverts is growing slowly, which may signal more vandalism, but the number of double reverts (revert wars) has stabilized.

Signs of permanent regime (maturity):

  1. There is a clear power-law distribution of the indegree and outdegree.
  2. The average edits per user has been mostly constant in the last two years.
  3. There is a high correlation between PageRank and indegree, indicating that the microscopic connectivity of the encyclopedia resembles its mesoscopic properties.
  4. The clustering coefficient and edge reciprocity of links have remained basically constant during the last two years.
  5. Over 2/3 of the articles belong now to the larger strongly connected component.

[edit] If you read the paper please leave your comments/suggestions here, please:

  1. Hey, this is an interesting paper, I really enjoyed reading it. Have you considered comparing the graphs of different language editions of wikipedia (French, German, Polish, Japanese, Italian)? -- 84.176.247.19 15:39, 19 July 2006 (UTC)
    • There is a comparison here[1] but it's a bit complex -- ChaTo 16:16, 29 July 2006 (UTC)
  2. I'd be happy to make a HTMLized version of the paper (with pdf2html), for ease of online reading; may I have your permission to do so? JesseW, the juggling janitor 19:45, 19 July 2006 (UTC)
    1. Unfortunately it is not possible currently due to the publication by IEEE CS Press :-( ChaTo 09:51, 2 October 2006 (UTC)
      1. Luckily, google has already done it. I presume the IEEE won't object to this, as it's a standard feature, applied to nearly all web pages. In any case, it's available. JesseW, the juggling janitor 22:58, 2 October 2006 (UTC)
  3. There's an error (although it's not that important in relation to the subject of the paper): "Reverts can be done by any registered user, and with this one-click operation an article can be taken back to a previous version. This is done mostly to fight vandalism. We detected when an update is a revert by searching for the string revert or rv in the comment of the edits (this is inserted automatically)." This is incorrect in a number of ways; actually reverts can be done by any user, logged in or not, although the ability to do them with one-click is available only to administrators and or users who use third-party javascript tools; further, the string "revert" or "rv" is not added automatically (except by the one-click tools), "rv", particuarly, is only used when people are manually reverting (i.e., by clicking on the history, clicking on a specific past revision, clicking on edit, and clicking save), which, as I said, any one can do. While I doubt it can be corrected now, I thought I should mention it. JesseW, the juggling janitor 23:41, 2 October 2006 (UTC)