Wikipedia:Bots/Requests for approval/John Bot IV

From Wikipedia, the free encyclopedia

[edit] John Bot IV

taskscontribscountsullogspage movesblock userblock logflag logflag bot

Operator: CWii(Talk|Contribs)

Automatic or Manually Assisted: Auto

Programming Language(s): Python

Function Summary: NFCC#9, NFCC#7, and possibly NFCC#10c

Edit period(s) (e.g. Continuous, daily, one time run): daily

Already has a bot flag (Y/N): N

Function Details: Goes through Category:All non-free media and check if:

  • The image is used outside the mainspace
    • If it is it removes it
  • That it is used in an article
  • In the image description the article name or some sort of rationale is present somewhere.

The code is still being, er, coded. So I'll post it when it's done.

[edit] Discussion

I assume it will also notify the uploader? Mr.Z-man 03:20, 19 May 2008 (UTC)

I've got two concerns here:

  1. Are you prepared to deal with hundreds of irate users bitching you out on your talkpage?
  2. When checking rationales, is the bot going to be at least as good as FairuseBot at following redirects and disambiguation pages to find the article where the image is used?

--Carnildo (talk) 07:09, 19 May 2008 (UTC)

I would emphasise Carnildo's first point. CWii, we have spoken about this before (your dealing with frustrated newbies)... thoughts? dihydrogen monoxide (H2O) 09:47, 19 May 2008 (UTC)

  • I strongly oppose any bot that goes around confronting newbies with those inscrutable jargon-filled templates. This bot would leave messages that make no attempt to explain the highly technical sense of "rationale" that 10c uses, why there is a problem, or what the user should do next to resolve it. I also suspect this bot would have a high rate of false positives, unless CWii explains more about how it will check for redirects, disambiguations, or near matches to the article's name. We already have bots that do a better job. rspeer / ɹəədsɹ 16:43, 19 May 2008 (UTC)
Whoah! Okay, yes it will notify users. Yes I am prepared for extreme amounts of bitching. For the 10c task I'm still not sure what I'll do, so we'll have to see. I might just code it later and have it approved. I have improving in my handling of upset users. For example this incident was handled well. If there are still concerns about this please let me know. Again the bot is still a work in progress, I would just like you input. Thanks, 18:03, 19 May 2008 (UTC)
My suggestion is to leave out 10c. 10c is very difficult to check for, and even when you get it right you're going to get lots of anger and confused people, because 10c is a very little-understood rule. FairuseBot does a good job of checking 10c and leaving messages that try to explain what's going on. I think your bot would do fine at enforcing criteria 7 and 9. If you want to put in the effort to make a good 10c bot, you could do it as a separate request. rspeer / ɹəədsɹ 19:42, 19 May 2008 (UTC)
Okay then. If I do code something for it I'll just make it log it on a subpage. CWii(Talk|Contribs) 19:50, 19 May 2008 (UTC)
FYI, here is ImageBacklogBot's source code. Soxred93 (u t) 01:52, 20 May 2008 (UTC)
eww, perl ;-) CWii(Talk|Contribs) 19:46, 20 May 2008 (UTC)
For notifying users, CWii is going to get alot of complaints. I would rather create a category called Category:Users willing to explain messages left by John Bot IV, and put a link at the end of the message to new users. LegoKontribsTalkM 02:25, 23 May 2008 (UTC)
10c is easy to check for w/ pywikipediabot. Use a regex like |article=(.*?)| and match that group to the usingPages or whatever the method is. Monobi (talk) 03:14, 24 May 2008 (UTC)
And that will improperly tag images where the article was renamed, or moved and replaced with a disambiguation page, or merged into a list, or where someone was lazy about diacritics. If I understand the regex you're propsing, it also misses cases where the page is linked to but not in a template, or where the page is mentioned in the text. FairuseBot has 125 lines of code just to try matching the pages the image is used on with the contents of the image description page.
Regex matching of template parameters is non-trivial, too. FairuseBot avoids this by using the API, but OrphanBot needs the following code to find images in template parameters: ([-A-Za-z0-9_]+[\\p{IsSpace}\x{200E}\x{200F}\x{202A}\x{202B}\x{202C}\x{202D}\x{202E}]*=)[\\p{IsSpace}\x{200E}\x{200F}\x{202A}\x{202B}\x{202C}\x{202D}\x{202E}]*" . MakeWikiRegex($raw_image) . "[ _]*"; There are still a few cases that it misses, mostly dealing with alternate encodings of characters. --Carnildo (talk) 04:42, 24 May 2008 (UTC)
Well, I epically fail for forgetting to escape | , among other things. Monobi (talk) 16:08, 24 May 2008 (UTC)
We have a media copyright help desk, it may be better to refer them there. Mr.Z-man 18:59, 23 May 2008 (UTC)
That's the plan. Also I'm waiting on BJweeks for some code. CWii(Talk|Contribs) 03:06, 24 May 2008 (UTC)
Again, like I said I'm not doing 10c. CWii(Talk|Contribs) 14:16, 24 May 2008 (UTC)
Also, I've been approved to run a clone of User:ImageBacklogBot, so NFCC#9 is taken care of. Soxred93 (u t) 04:37, 25 May 2008 (UTC)
So is this bot just doing NFCC#7? dihydrogen monoxide (H2O) 09:48, 26 May 2008 (UTC)

Will the bot messages provide links to pages that describe how to write fair use rationales for standard cases such as logos, album covers and screenshots, preferably with easy-to-use templates? --Apoc2400 (talk) 22:17, 25 May 2008 (UTC)

Yes of course! CWii(Talk|Contribs) 02:33, 26 May 2008 (UTC)
I don't think the bot is doing that task anymore, is it? dihydrogen monoxide (H2O) 09:48, 26 May 2008 (UTC)
No, It will be doing 9 and 7. I find it more efficient to do a sweep and check for more things at once, to keep sever resource consumption to a minimum. CWii(Talk|Contribs) 13:45, 26 May 2008 (UTC)
Before approving a trial (I'm comfortable with it doing 9 and 7 as long as you remember that civility stuff we've spoken about ;), can we see the templates that will be used on the image pages and on the user talk pages? dihydrogen monoxide (H2O) 08:53, 27 May 2008 (UTC)
Whoops, I forgot those :). I'll make those in a bit but for now the code I ask User:Bjweeks for should be here by tomorrow so I can finish coding this damn thing. Thank you for your concerns. CWii(Talk|Contribs) 11:50, 27 May 2008 (UTC)
The message and the Image tag. For NFCC 9 I won't give any warnings, since it could be anyone. CWii(Talk|Contribs) 12:12, 27 May 2008 (UTC)
Can you add a link to the image help desk and to a page with basic information about adding images to articles? dihydrogen monoxide (H2O) 02:04, 29 May 2008 (UTC)

Update: This is what I have so far. Yay? :-) CWii(Talk|Contribs) 21:08, 3 June 2008 (UTC)

Please subst: the talk page notice LegoKontribsTalkM 00:21, 4 June 2008 (UTC)
Good catch! CWii(Talk|Contribs) 00:22, 4 June 2008 (UTC)
READY FOR TRIAL Final code is here: User:John Bot IV/Source CWii(Talk|Contribs) 00:29, 4 June 2008 (UTC)

Approved for trial (50 edits). giggy (:O) 08:50, 5 June 2008 (UTC)

It's having trouble finding policy violating images :( So this might take a few days to a week to complete the trial. Sorry, CWii(Talk|Contribs) 01:47, 6 June 2008 (UTC)

On hold - School is getting in the way right now due to projects and finals. June 25 is when (IF I don't fail anything) this could be started at the latest. Sorry, CWii(Talk|Contribs) 00:14, 7 June 2008 (UTC)