Wikipedia:Types of bots

From Wikipedia, the free encyclopedia

Below is a categorization of possible useful Wikipedia bots. None of these possibilities are necessarily recommended or implemented.

Examples of actual bots used on Wikipedia can be found at history of Wikipedia bots.

[edit] Editing bots

  1. Automatic importer-by-request : This bot imports entries from a public/GFDL database one manual request at a time. There is not an implementation of a user interface for seeing possible entries, etc.
  2. Automatic importer : This bot imports batches of entries from a public/GFDL database. If used, it is expected that it imports Wikipedia entries that are as well-formed as possible. See the history of Wikipedia bots for examples.
  3. Other automatic tools and scripts : This includes spell checkers, wikifiers, etc. The possibilities are endless.
  4. Anti-vandalism : Finds pages that have been blanked, or nearly blanked and if weighted set dictionary of <new version> is far less significant than <old version> notes it on some maintenance page and reverts. Newer versions of anti-vandalism bots are more extensive and have been able to automatically revert a large portion of the vandalism that occurs.
  5. Ban enforcement : Finds and reverts changes by suspicious new users/shared IPs/hosting IPs (open ports) to pages targeted by sockpuppets as a possible recurrence of a banned user using alternate IP addresses. Older users can restore any such edits that don't appear to be by the banned user in question.

[edit] Non-editing bots

Data miner 
A tool which attempts to use information extraction techniques to extract structured information from Wikipedia. If you want to do this, it is preferable to download a database dump and run the bot on your own server. You will get vastly better performance, and will not interfere with other Wikipedia users, or cause unneeded network traffic. The only disadvantage of this is that your copy of Wikipedia will not incorporate the most recent changes; but this should not be too big an issue for most information extraction applications. (You can always redownload and rerun the application at a later date.)
Vandalism identifier 
Uses heuristics to search for possible uncorrected vandalism (preserved changes by known vandals, curse words list, similar edits by other vandals etc.).
Copyright violation identifier 
Similar to vandalism identifier, compares chunks of text on new pages to what already exists on the internet; reports possible infringements to a page where human editors can review.

[edit] See also