User:Alterego

From Wikipedia, the free encyclopedia

I have been active in Wikipedia for quite some time, having attended both conferences, the first in Frankfurt and the second at MIT. While I am the primary author of MBTI and List of people considered to be deities, most of my contributions have been off-line (see WikiPulse below).

1 Current efforts - Quality
2 Qwikly
3 WikiPulse
4 Wikipedia Publications

[edit] Current efforts - Quality

I am presently investigating several Natural Language Processing algorithms (such as Maximum Entropy...) and their efficacy in determining the quality of Wikipedia articles. Stay tuned for this Wikimania 2007.

[edit] Qwikly

Qwikly was a website I started with User:Erik Zachte in Feb. of 2005 and is still available on Archive.org. Erik had written a perl script that was able to convert the MySQL dump into the TomeRaider format, which was viewable on PDAs with a large CF/SD card. We got together to commercialize this idea, and based on a server out of my bedroom, Qwikly.com was born (wiki means "quick quick" or somesuch, so qwikly is just an amalgamation). We created versions of Wikipedia, Wikiquote and Wiktionary in six different languages. We also offered niche encyclopedias based on topics such as Star Wars (found on Wiki Cities). While our product was incredibly useful to those who actually downloaded it and loaded it up on their PDA, we were ultimately ahead of the times and increasingly inundated by Wikipedia's ever-increasing size. I believe the largest single encyclopedia we produced for a PDA was based on the English Wikipedia, had over 600,000 articles and 90,000 images, weighing in at just under 2GB - the largest CF card available at the time (with a reasonable cost).

[edit] WikiPulse

As a subproject of the Qwikly site, I was the first person to create a realtime statistical picture of the goings on here at Wikipedia. The project was called WikiPulse. This was a dramatic exercise in screen scraping, as the data was aggregated from some fifteen sources. Here is a sample snapshot of the data it produced (sans the many realtime graphics):

The Wikimedia grid is currently FAST and AVAILABLE

In the last 27 minutes there have been 4 new articles, 49 new pages, 502 new edits, 1 new messages on the mailing lists, 0 new topics on the village pump, and we have used 512.00 MB of outgoing bandwidth. On the English Wikipedia there are currently 506347 total articles, 552 featured articles, and 1421562 total pages, with 12810635 total edits and 9.01 average edits per page. There are 218868 registered users and 424, or 0.02% are administrators. 172 people are currently chatting in #Wikipedia, and over the last 211 days there have been 5052 nicks. On the Village Pump there are 263 ongoing conversations. There have been 3917 messages across all wikimedia mailing lists this month. 37 total CPUs are operational right now and there have been 2760000 pages indexed by Google, comprising 0.03% of their index. The MediaWiki project is currently ranked 813 on SourceForge and the software has been downloaded 96789 total times. Ping response time from Colorado is 87 milliseconds. So far this month we have used 11501.50 GB of outgoing, and 1171.79 GB of incoming bandwidth. Finally, there are 751 open bug reports.

This user has a dual boot configuration.

f	This user contributes with Fedora.

This user has more monitors than the Nebuchadnezzar.

1600 x 1200

This user contributes at
1600 x 1200 resolution.

This user hacks happily with Emacs.

Ps	This user contributes using Adobe Photoshop.

py	This user can program in Python.

C-2

This user is an intermediate C user.

js	This user can program in JavaScript.

This user can typeset using LaTeX.

This user prefers Mozilla Firefox.

f	This user has a Flickr account as breflection / breflection .

G	This user uses Google as a primary search engine.

This user uses Gmail as a primary E-mail service.

f	This user maintains a facebook profile.

[edit] Wikipedia Publications

This is a list of publications that have used the Wikipedia dataset. I am particularly interested in those with a linguistics and Natural Language Processing aspect. Feel free to add any that you know of, as I have not yet attempted an exhaustive search. The place to start is, of course, Google Scholar :)

Weale, Timothy (2006), Utilizing Wikipedia Categories for Document Classification, <http://www.cse.ohio-state.edu/~weale/docs/presentations/2006.12.07.Wikipedia.pdf>
Gabrilovich, Evgeniy & Markovitch, Shaul (2007), Overcoming the Brittleness Bottleneck using Wikipedia: Enhancing Text Categorization with Encyclopedic Knowledge, Technion—Israel Institute of Technology, 32000 Haifa, Israel, <http://www.cs.technion.ac.il/~shaulm/papers/pdf/Gabrilovich-Markovitch-aaai2006.pdf>
Ruiz-Casado, Maria; Alfonseca, Enrique & Castells, Pablo (2005), Automatic Extraction of Semantic Relationships for WordNet by Means of Pattern Learning from Wikipedia, vol. Volume 3513/2005, Springer Berlin / Heidelberg, doi:10.1007/11428817_7, <http://www.springerlink.com/content/m2ajcway6b8hrf99/>
Strube, Michael & Ponzetto, Paolo (2006), WikiRelate! Computing Semantic Relatedness Using Wikipedia, <http://www.eml-research.de/english/homes/strube/papers/aaai06.pdf>
Fissaha Adafre, Sisay & Maarten, de Rijke (2005), Discovering missing links in Wikipedia, University of Amsterdam, SJ Amsterdam, The Netherlands, <http://portal.acm.org/citation.cfm?id=1134284>
Toral, Antonio & Mu˜noz, Rafael, A proposal to automatically build and maintain gazetteers for Named Entity Recognition by using Wikipedia, University of Alicante, Carretera San Vicente S/N, Alicante 03690, Spain, <http://acl.ldc.upenn.edu/W/W06/W06-2809.pdf>
Gabrilovich, Evgeniy & Markovitch, Shaul (2007), Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis, Department of Computer Science, Technion—Israel Institute of Technology, 32000 Haifa, Israel, <http://www.ijcai.org/papers07/Papers/IJCAI07-259.pdf>

User:Alterego

From Wikipedia, the free encyclopedia

Contents

[edit] Current efforts - Quality

[edit] Qwikly

[edit] WikiPulse

[edit] Wikipedia Publications

Views

Navigation

Interaction

Search