Wikipedia:Cleanup sorting

From Wikipedia, the free encyclopedia

Shortcut:
WP:CSORT

The backlog of articles piling up on Category:Cleanup by month is threatening the quality of Wikipedia as a whole. To help with this problem, the process described here attempts to automatically list articles tagged for cleanup on relevant topical subpages of Wikipedia:Pages needing attention. Those topical pages can then be consulted by editors interested in particular subject areas, such as members of specific WikiProjects.

Contents

[edit] Summary

The approach developed here uses a robot (Pearle) to periodically find pages flagged for cleanup, wikification, etc., sort them by topic, and list them on topical subpages of Pages needing attention ("PNA"). The topical sorting is done by the categories which the articles belong to, and the categories Pearle uses are first manually placed on the topical subpages (or are found from the Portal pages listed there).

[edit] Status

Many, but not all, PNA subpages have been converted to the format needed for bot processing and have had the bot process them. The bot is run irregularly, perhaps every couple of months, depending in part on when successful database dumps are acquired. Thus the topical cleanup lists under PNA are generally a bit out-of-date, but still seem quite serviceable.

The PNA subpages still needing to be converted to the new format (and maybe to some extent further reorganized) as of 30 July 06 are those under the Culture and Arts heading on the PNA page.

Some of already-converted subpages still have old listings on them (as well as bot-generated listings). These old (often ancient) listings need eventually to be checked against the articles and as appropriate simply deleted or have their comments moved to the article talk page and perhaps have the article tagged with {cleanup} or something (and then have the listing removed from the PNA subpage).

[edit] Robot-generated PNA lists approach

This mechanism uses minimal human setup to allow a computer program (bot) to sort articles needing some form of cleanup into topical groups. Once the sorting criteria are set up, the bot can be run to find articles tagged for cleanup and place links to them onto a Wikipedia:Pages needing attention (PNA) subpage. The bot will need to be re-run from time to time to produce more up-to-date lists of links.

This sorting mechanism produces lists providing an alternative way of letting editors find articles which they may be interested in working on, since editors are likely to be interested in particular subject areas. It it not intended to replace other mechanisms, nor require any change in the procedures followed by most editors.

[edit] Using categories for sorting

The central idea is for the bot to use existing information to determine what subject area(s) an article is associated with. An article typically has been included in one or more categories, either normal or stub categories. These categories indicate the topical content of the article and can drive the sorting purpose fairly well, with no requirement for special tagging of each article. Articles without categories will not be listed by this mechanism.

[edit] Identifying articles for listing

The bot will pick articles for listing by finding pages marked with normal tags used for flagging problem pages. This includes {{cleanup}}, {{wikify}}, {{expand}}, {{POV}}, and {{merge}}. Editors use these tags now for maintenance, so the bot can find these pages with no change to editing procedures.

[edit] Bot operates on PNA leaf subpages

The main PNA page has many subpages to which it links. The bot will operate only on leaf subpages; these are subpages that do not include other subpages. Some PNA subpages (for instance Wikipedia:Pages needing attention/Applied Arts and Sciences) are non-leaf or "composite", that is they transclude several leaf subpages (which contain the main lists of problem pages). The composite pages are simply a convenience for those who want to view several related subjects at once.

The bot operates on a leaf PNA subpage in two ways: it serves both as an input to the bot to specify the categories to sort by for that page, and as an output, having the lists of problem pages being added to it.

A leaf subpage must have a particular format to be correctly operated on by the bot. The format is defined in the template section below. At the top of the page are the inputs: lists of portals, WikiProjects, normal categories, and stub categories. In this area there may also be leftover manually entered comments on pages; if present, these are ignored by the bot.

The bottom of a PNA leaf subpage is erased by the bot each time it runs, and replaced by the lists of sorted article links which it generates. Since it is erased, it is important not to insert material into that area.

[edit] How to convert a PNA subpage

If there is an existing PNA subpage which is to be directly converted into a single new format topical subpage, the procedure is relatively straightforward. You paste the wikitext of a template into the subpage as described below in #Using a template for converting PNA subpages. You then find and fill in the names of the appropriate Portals, WikiProjects, normal Categories, and stub Categories into the indicated places in the wikitext.

The reason for the Category links is to guide the bot: pages with {cleanup} etc tags in those categories (or in one or two levels of subcategories of those) will be listed on the subpage. The Portals you enter (if any) are used to get a whole set of such categories from the portal page. One has to be careful adding a portal because they may have too broad a list of categories, which will cause the PNA subpage to have way too many irrelevant cleanup listings.

Since the sorting is driven by categories, the old organization of the PNA subpages may not be what is desirable. It's advisable to reorganize the subpages in such cases, remembering that the aim to get reasonably-sized and reasonably-selective subpages.

[edit] Using a template for converting PNA subpages

The wikitext of the following subsection (Current template for conversion) should be used as the pattern for a page to be processed by the bot. Use "[edit]" of the section to obtain the section's wikitext, then paste it into the page being converted. If there are old (pre-robot) PNA listings on the page, move them to the place indicated in the comments of the wikitext, and remove the comment markings just above so that the "To be manually checked" text will be displayed.

[edit] Current template for conversion

  • Portals
    • Portal:(name)
  • WikiProjects
    • Wikipedia:WikiProject (name)
  • Categories covered
    • Category:(name)
  • Stubs
    • Category:(name) stubs

Do not add text below this point

  • Cleanup needed
    • (to be updated by bot)
  • Expansion needed
    • (to be updated by bot)
  • Expert attention needed
    • (to be updated by bot)
  • Wikification attention needed
    • (to be updated by bot)
  • Neutrality questioned
    • (to be updated by bot)