From Wikipedia, the free encyclopedia
I have been active in Wikipedia for quite some time, having attended both conferences, the first in Frankfurt and the second at MIT. While I am the primary author of MBTI and List of people considered to be deities, most of my contributions have been off-line (see WikiPulse below).
[edit] Current efforts - Quality
I am presently investigating several Natural Language Processing algorithms (such as Maximum Entropy...) and their efficacy in determining the quality of Wikipedia articles. Stay tuned for this Wikimania 2007.
[edit] Qwikly
Qwikly was a website I started with User:Erik Zachte in Feb. of 2005 and is still available on Archive.org. Erik had written a perl script that was able to convert the MySQL dump into the TomeRaider format, which was viewable on PDAs with a large CF/SD card. We got together to commercialize this idea, and based on a server out of my bedroom, Qwikly.com was born (wiki means "quick quick" or somesuch, so qwikly is just an amalgamation). We created versions of Wikipedia, Wikiquote and Wiktionary in six different languages. We also offered niche encyclopedias based on topics such as Star Wars (found on Wiki Cities). While our product was incredibly useful to those who actually downloaded it and loaded it up on their PDA, we were ultimately ahead of the times and increasingly inundated by Wikipedia's ever-increasing size. I believe the largest single encyclopedia we produced for a PDA was based on the English Wikipedia, had over 600,000 articles and 90,000 images, weighing in at just under 2GB - the largest CF card available at the time (with a reasonable cost).
[edit] WikiPulse
As a subproject of the Qwikly site, I was the first person to create a realtime statistical picture of the goings on here at Wikipedia. The project was called WikiPulse. This was a dramatic exercise in screen scraping, as the data was aggregated from some fifteen sources. Here is a sample snapshot of the data it produced (sans the many realtime graphics):
- The Wikimedia grid is currently FAST and AVAILABLE
- In the last 27 minutes there have been 4 new articles, 49 new pages, 502 new edits, 1 new messages on the mailing lists, 0 new topics on the village pump, and we have used 512.00 MB of outgoing bandwidth. On the English Wikipedia there are currently 506347 total articles, 552 featured articles, and 1421562 total pages, with 12810635 total edits and 9.01 average edits per page. There are 218868 registered users and 424, or 0.02% are administrators. 172 people are currently chatting in #Wikipedia, and over the last 211 days there have been 5052 nicks. On the Village Pump there are 263 ongoing conversations. There have been 3917 messages across all wikimedia mailing lists this month. 37 total CPUs are operational right now and there have been 2760000 pages indexed by Google, comprising 0.03% of their index. The MediaWiki project is currently ranked 813 on SourceForge and the software has been downloaded 96789 total times. Ping response time from Colorado is 87 milliseconds. So far this month we have used 11501.50 GB of outgoing, and 1171.79 GB of incoming bandwidth. Finally, there are 751 open bug reports.
|
f |
This user contributes with Fedora. |
|
|
|
|
This user hacks happily with Emacs. |
|
|
|
|
|
|
This user can typeset using LaTeX. |
|
|
|
G |
This user uses Google as a primary search engine. |
|
|
f |
This user maintains a facebook profile. |
|
[edit] Wikipedia Publications
This is a list of publications that have used the Wikipedia dataset. I am particularly interested in those with a linguistics and Natural Language Processing aspect. Feel free to add any that you know of, as I have not yet attempted an exhaustive search. The place to start is, of course, Google Scholar :)
- Weale, Timothy (2006), Utilizing Wikipedia Categories for Document Classification, <http://www.cse.ohio-state.edu/~weale/docs/presentations/2006.12.07.Wikipedia.pdf>
- Gabrilovich, Evgeniy & Markovitch, Shaul (2007), Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge, Technion—Israel Institute of Technology, 32000 Haifa, Israel, <http://www.cs.technion.ac.il/~shaulm/papers/pdf/Gabrilovich-Markovitch-aaai2006.pdf>
- Ruiz-Casado, Maria; Alfonseca, Enrique & Castells, Pablo (2005), Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia, vol. Volume 3513/2005, Springer Berlin / Heidelberg, doi:10.1007/11428817_7, <http://www.springerlink.com/content/m2ajcway6b8hrf99/>
- Strube, Michael & Ponzetto, Paolo (2006), WikiRelate! Computing Semantic Relatedness Using Wikipedia, <http://www.eml-research.de/english/homes/strube/papers/aaai06.pdf>
- Fissaha Adafre, Sisay & Maarten, de Rijke (2005), Discovering missing links in Wikipedia, University of Amsterdam, SJ Amsterdam, The Netherlands, <http://portal.acm.org/citation.cfm?id=1134284>
- Toral, Antonio & Mu˜noz, Rafael, A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia, University of Alicante, Carretera San Vicente S/N, Alicante 03690, Spain, <http://acl.ldc.upenn.edu/W/W06/W06-2809.pdf>
- Gabrilovich, Evgeniy & Markovitch, Shaul (2007), Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis, Department of Computer Science, Technion—Israel Institute of Technology, 32000 Haifa, Israel, <http://www.ijcai.org/papers07/Papers/IJCAI07-259.pdf>