Wikipedia:Bots/Requests for approval/Polbot 3

From Wikipedia, the free encyclopedia

[edit] Polbot

taskscontribscountsullogspage movesblock userblock logflag logflag bot

Operator: Quadell

Automatic or Manually Assisted: Automatic, supervised

Programming Language(s): Perl (with Perlwikipedia)

Function Summary: Add {{WPBiography}} where there's persondata; add then add {{DEFAULTSORT}} where it's obvious, and collect summary info where it's not.

Edit period(s): one time run, or in small batches

Edit rate requested: 6 edits per minute

Already has a bot flag (Y/N): Yes

Function Details: The bot will look through every article that transcludes the WPBiography template (nearly 400,000 articles so far) looking for a sortname (such as "Smith, John") and living/dead info. This might be found within the WPBiography template itself, in persondata, in categories, in a DEFAULTSORT tag -- or in multiple locations which may or may not agree. The bot will log that info (the various listed sortnames and living/dead info for each article) in a text file on my server. While I'm there, in certain limited circumstances, it will standardize this info in the article itself. See User:Polbot/ideas/defaultsort#Detailed specification for more details than you could shake a stick at.

The logfile that is created by this process will (it is hoped) be perused by volunteer humans to pick a proper sortname from among the choices, and a later bot (not requested for authorization at this time) will fix or add DEFAULTSORT info in the articles.

[edit] Discussion

  • Thanks for submitting this. One quick comment: please don't confuse {{DEFAULTSORT}} (the template) and {{DEFAULTSORT:sortkey}} (the magic word). Both are used. We should be using the magic word, not the template. Carcharoth 00:36, 15 June 2007 (UTC)

Looks good. I'm supporting it. ~ Wikihermit 23:16, 15 June 2007 (UTC)

Same here, looks like a resourceful task. Good luck. E talk 23:17, 15 June 2007 (UTC)
  • We need to step back and reevaluate before moving forward on this. Discussion is ongoing. Hold on a bit.Quadell (talk) (random) 23:19, 15 June 2007 (UTC)
    • Okay, I've decided to go forward with this. Full steam ahead. – Quadell (talk) (random) 22:10, 18 June 2007 (UTC)
Can you provide a link to the discussion? —METS501 (talk) 19:30, 16 June 2007 (UTC)
    • Yes, sorry, here and here, mostly. The main thing is, right now metadata is handled haphazardly in several ways. If there were a standardized and simplified way of handling metadata, it would definitely be an improvement; but then this bot's work would be overwritten, basically. So I'm waiting to see if it would be better to use the proposed {{UsePersondata}} tag or something similar. – Quadell (talk) (random) 19:56, 16 June 2007 (UTC)
      • Personally, I feel that we shouldn't wait for the metadata discussions to be concluded. Those discussions will probably turn out to be very long and involved, if only because it will change the way many editors are familiar with handling such metadata, and they will probably object. In any case, it should be possible to carry out edits with this bot and then migrate the data later with no loss of information. Consider it an interim improvement. By the way, where are the discussion about this proposed "UsePersondata" tag? Carcharoth 16:42, 17 June 2007 (UTC)

A couple edits of what this bot would do would be nice. Consider it a 2 edit trial -- Tawker 19:48, 16 June 2007 (UTC)

Another point, one of the pressing questions I was hoping this bot would answer, is one of scale. Currently, around 10,000 articles use Persondata, and nearly 400,000 articles use the WPBiography template. What I don't know, and would like to see answered, is how many of these articles use the DEFAULTSORT magic word, and how many use the WPBiography 'listas' parameter (or rather, how many don't use these sort keys), and how many still use individual category pipe-sorting instead? It is entirely possible that the vast majority of the nearly 400,000 biographical articles (say 300,000) lack any sort key whatsoever. If so, only the 100,000 using them would need to be standardised. The other 300,000 would be turfed over to a human project to decide what the appropriate sort key is. Finding out how many articles use the 'listas' parameter would require an edit to the template along the lines of that done in the past at Template talk:Infobox Writer#Magnum Opus, but detecting which biographical articles have category pipes and/or DEFAULTSORT entries is more difficult. I'm going to ask a developer if there is a quick way to do that. Carcharoth 16:41, 17 June 2007 (UTC)

Update. The discussions are either inconclusive or ongoing. See mw:User talk:Robchurch#DEFAULTSORT and sort keys in general and mw:Talk:API#DEFAULTSORT key, in case anyone with more computing know-how than me (not difficult) gets a bright idea from them. Carcharoth 21:54, 18 June 2007 (UTC)
  • It appears that it would not be prudent for me to wait on metadata standardization before running this bot. Feel free to approve it for a trial run, if you deem it acceptable. – Quadell (talk) (random) 22:10, 18 June 2007 (UTC)
  • Looks good. I have a couple of questions. 1) Is the code written and waiting to go? (If it is, I see no reason why we shouldn't have a small trial run; it's always easier to evaluate a trial run than it is to evaluate mere words). 2) Is there not a danger that this will result in mis-sorting? My thinking is that in some categories, the DEFAULTSORT won't apply. Imaginary example: George W. Bush in Category:Presidents would sort as "Bush, George W." In Category:Bush family he would sort as "George W." --kingboyk 21:11, 20 June 2007 (UTC)
    • Answer #1: The code for reading the pages and making a logfile is written and waiting. I'm still. . . polishing. . . the function that actually changes the pages' wikicode. If I get the green light for a test, I'll have it done within 12 hours. Yes, the code is done and ready to run. Answer #2: This is a very cautious bot. The only situation where it would write a DEFAULTSORT is when there is a "listas" parameter in the WPBiography template, and also when every other sortname (category pipes and Persondata name) give the exact same sortname as the listas. So if one category pipe gives a different sort, then the DEFAULTSORT will not be written. (It will be logged, though, for an editor to look at.) – Quadell (talk) (random) 22:16, 20 June 2007 (UTC)
    • Another answer to #2: Look at Jeb Bush. That already has DEFAULTSORT and is incorrectly sorted in Category:Bush family. But if you add a pipe-sort to give [[Category:Bush family|Jeb]], that will over-ride the DEFAULTSORT. This is why templates that pipe-sort their categories using [[Category:Random|{{PAGENAME}}]] (so as to avoid talk and user pages being grouped separately) over-ride the DEFAULTSORT magic sort. Incidentially, many of the articles in Category:Bush family are incorrectly pipe-sorted, but the category is unwieldy at the moment anyway, and fails to help people navigate to the area of the Bush family they might be interested in. As Quadell says, the idea is to standardise pages where the sort keys are the same. Where they are different (or absent), humans need to do some checking. Ultimately, only having one sort key in one location would also be a great boon (updating the same information in two locations is silly and inefficient), but that is now part of the bigger metadata debate (which will take a long time to resolve, hence Quadell deciding to go ahead with this for now). Carcharoth 11:43, 21 June 2007 (UTC)

Also, people wanted to see 2 or 3 examples of what the code would do. Here it is: [1], [2], and [3]. – Quadell (talk) (random) 02:26, 21 June 2007 (UTC)

And if you dig through the history of the articles and the talk pages, you can see where people added some parameters but didn't add them all. eg. Category:Living people, but not "living=yes" and "listas" or category pipes, but not DEFAULTSORT. In the Donovan Swailes case, the category piping was present from when the article was created (26 May 2005), the WPBiography template was added on 20 September 2006, the listas parameter added on 18 May 2007, the living=no on 7 June 2007, and the DEFAULTSORT on 21 June 2007. This means that future categories can be added without the need for pipe-sorting. Carcharoth 11:53, 21 June 2007 (UTC)

It seems there's unanimous support for this. Can I run it? – Quadell (talk) (random) 15:47, 26 June 2007 (UTC)

Also note the ballpark figure over at User:Polbot/ideas/defaultsort#Further discussion, where it is estimated that around 20,000 articles already contain DEFAULTSORT. So the majority of the bot's work may be just gathering other sortkey data from the other 300,000+ articles. Plus the category stuff as well - not sure what the scale of that part of the bot's work is. Carcharoth 16:54, 26 June 2007 (UTC)

Approved for trial. Yeah, go ahead woth 50 edits or so. Sorry for the delay. --ST47Talk 14:26, 9 July 2007 (UTC)

[edit] Trial run

OK, if Quadell doesn't mind, I'll come up with a list of 50 articles that include all the plausible combinations of different actions (selecting 50 at random from 350,000 might not really work here), and what I, as a human would do with those articles. If the bot does the right actions, I think we can safely say it passes the Turing test, let alone knows how to do this task. :-) Carcharoth 14:39, 9 July 2007 (UTC)

That sounds great! Thanks. – Quadell (talk) (random) 14:50, 9 July 2007 (UTC)
I've made a start at User:Carcharoth/Polbot3 trial run, which were really picked at random from the first 5000 on what links here for WPBiography. I've ensured a scattering of special characters, pages with parentheses, commas, ordinal numbers, funny names, and so on. I even included some that are false positives (they aren't really biographies). Not a comprehensive trial, and I'm now slowly picking my way through them and realising that a random pick tends to find more dead people than living ones... I may need to pop over to Category:Living people for a more representative sample. Carcharoth 15:59, 9 July 2007 (UTC)
  • I've finished making notes (offline) on what I think should happen when the bot runs over the list of 31 at User:Carcharoth/Polbot3 trial run. Quadell, do you want to see those notes before or after you run the bot over the list? :-) I've included Alan Turing in the list, along with a little surprise (don't look in the edit history, otherwise it will spoil the surprise!). When it is done, could you post a link so the edits and edit summaries can be looked at. Do you know how to set up the URL to link to a set of edits? eg. this is a set of the 5 edits I did leading up to 01:39:18 on 2007/07/09: see here. Depending how many edits the bot makes, you'd set the limit and offset accordingly for Polbot's contributions. Carcharoth 19:59, 9 July 2007 (UTC)
I have run the trial. See all the gooey details at User talk:Carcharoth/Polbot3 trial run. – Quadell (talk) (random) 15:53, 10 July 2007 (UTC)