User:Gdr/Nomialbot

From Wikipedia, the free encyclopedia

Nomialbot is a Wikipedia bot that performs miscellaneous operations on taxoboxes. (See Wikipedia:WikiProject Tree of Life.)

Nomialbot uses the Python Wikipedia Robot Framework. It is currently in development. It is operated from time to time by User:Gdrbot.

Contents

[edit] Making redirects

Nomialbot makes redirects from the Latin names of genera, species and subspecies to articles concerning those taxa. For example, the Lesser Sand Plover has the Latin species name Charadrius mongolus so the latter should redirect to the former.

It does this as follows:

  1. Discover all articles with taxoboxes that go down to the genus or species level, by finding articles that link to Template:Taxobox species entry and similar templates (see Wikipedia:WikiProject Tree of Life/taxobox usage).
  2. Fetch these articles (in batches using Special:Export to reduce server load).
  3. Analyze these articles to build up a table mapping taxon name (e.g. genus Charadrius) to a list of articles referring to that taxon in their taxoboxes.
  4. When there is only one article referring to a taxon, and if there is no Wikipedia article for that taxon, then create a redirect from the taxon to the article.
  5. Except that in the case of genera, the redirect is only created if the genus is in bold italics in the taxobox; this means that the genus is monotypic and so the redirect is appropriate. Other genera are left unlinked even if Wikipedia only has one article on a species in that genus, because it is better to leave the genus name as a red link to invite editors to write an article.

[edit] First run, 2005-05-17/2005-05-20

  • There were 4613 articles with taxoboxes with genus, species, binomial or trinomial entries.
  • Nomialbots created:
    • 1223 redirects from binomials;
    • 22 redirects from trinomials;
    • 264 redirects from genera marked as monotypic in the taxobox.
  • A number of mistakes in taxoboxes were found; see /Report 2005-05-19 for typos, missing taxobox entries and genera incorrectly marked as monotypic.

[edit] Second run, 2005-06-05

  • Nomialbot created:
    • 999 redirects from binomials;
    • 15 redirects from trinomials;
    • 134 redirects from genera marked as monotypic in the taxobox.

[edit] Adding missing binomials

Nomialbot adds a binomial section to a taxobox if these taxobox has entries for genus and species but no binomial. The binomial is deduced from the genus and species, if species takes one of these forms:

  1. Genus species
  2. G. species
  3. species (technically incorrect but many taxoboxes still use this form)

[edit] Converting old style taxoboxes

Between 2005-05-24 and 2005-05-30, Nomialbot updates taxoboxes so that they all use the fully templatized form documented at Wikipedia:WikiProject Tree of Life/taxobox usage.

Taxoboxes were originally written using HTML tables, then after the release of MediaWiki 1.3 were converted through several intermediate stages to use templates and MediaWiki table syntax. Before Nomialbot converted them, there were articles with taxoboxes at several stages of this development:

  1. Taxoboxes in pure HTML (example: Pompeii worm).
  2. Taxoboxes in HTML with {{taxonomy}} (example: Poison dart frog).
  3. Taxoboxes in pure wiki table markup (example: Argilophilus).
  4. Taxoboxes in HTML with templatized rank names (example: Southern Elephant Seal,).
  5. Taxoboxes in wiki table markup with templatized rank names (example: White-eye)
  6. Taxoboxes in strange hybrid format, with a mixture of wiki table markup and taxobox templates (examples: White Rhinoceros, Mole Salamanders)
  7. Taxoboxes fully templatized.

Nomialbot found taxoboxes at stages 1–6 by looking for articles that linked to Scientific classification or Binomial name or Binomial nomenclature or {{Regnum}} but which didn't use {{Taxobox begin}}. It then did its best to convert the taxobox to be fully templatized. In cases where it couldn't complete the the conversion, it added the partial conversion to the top of the article in a comment. These were then listed and converted by hand.

[edit] Results

  • Of taxoboxes at stages 1 and 2:
    • 341 articles were found;
    • 239 were automatically converted by Nomialbot;
    • 102 were too awkward to convert automatically; a partial conversion was added to the article;
    • See /Report 2005-05-26 for a list of the latter.
  • Of taxoboxes at stage 4:
    • There were 2,374 articles with taxoboxes at stage 4;
    • See /Report 2005-05-24 for a list;
    • 2,216 were converted (mostly by Nomialbot, a few by User:DanielCD);
    • 158 were too awkward to convert automatically; a partial conversion was added to the article;
    • See /Report 2005-05-26 for a list of the latter.
  • Of taxoboxes at stages 3, 5 and 6:
    • 201 were found;
    • 123 were converted automatically;
    • 78 were too awkward to convert automatically; a partial conversion was added to the article
    • See /Report 2005-05-30 for a list of the latter.

[edit] Bugs needing fixing in a subsequent pass over the taxoboxes

  1. A number of instances of {{Taxobox section binomial}} incorrectly have an extra comma after the author.
  2. 200px is too narrow for a picture in a taxobox. Change this to 250px.

[edit] Fixed bugs

  1. Some picture captions were mangled; see for example Apis (genus). I believe these are all fixed now.
  2. Some subgenera in the placement section were not converted. These were all in articles on pines and were found and fixed by User:MPF.
  3. Parentheses in the binomial authority were removed. Fixed.
  4. Some of the binomial_name parameters incorrectly include text following the binomial. There's a list of these at /Report 2005-06-04. All fixed.

[edit] Adding authorities

Nomialbot can assist with the addition of authorities to taxoboxes. See User:Gdr/authority.py.

[edit] Conversion to single template

From 2006-01-30 to 2006-02-01, Nomialbot converted about 16,000 multi-template taxoboxes to use {{Taxobox}}.

There were about 70 articles that couldn't be converted, plus another 180 or so articles with virus taxoboxes. See /Report 2006-02-01 for the list.

Source code at User:Gdr/taxoconvert.py.