Wikipedia:Bots/Requests for approval/RobinBot

From Wikipedia, the free encyclopedia

< Wikipedia:Bots | Requests for approval

The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section. The result of the discussion was

Request Expired.

[edit] RobinBot

tasks • contribs • count • sul • logs • page moves • block user • block log • flag log • flag bot

Operator: koder

Automatic or Manually Assisted: Automatic supervised during trial, automatic unsupervised afterwards. Can be manually told which pages to analyze, as well.

Programming Language(s): php, C.

Function Summary: Monitors RC's XML feed for users either unintentionally or intentionally adding Affiliate marketing links and removes the affiliate portion of the link. For now, will only check for Amazon Associates links. Optionally, if the Wikimedia Foundation would like, it can have me replace the affiliate portion with Wikimedia's own affiliate tags in order to create pseudo-donations. Bot will traverse old dumps to check for affiliate tags, add them to its queue, and perform the edit removals during non-peak hours.

Edit period(s) (e.g. Continuous, daily, one time run): Continuous.

Edit rate requested: Most likely less than 1 edit per minute for RC patrol (i.e. whenever someone adds affiliate links to a page), and during non-peak hours when it is playing catch-up with the old database dumps, at most 5 per minute (or whatever you guys want, obviously). The latter will only be implemented on approval in order to avoid tons of RC entries until it has its bot flag.

Already has a bot flag (Y/N):N

Function Details:n/a

[edit] Discussion

Just FYI: I specifically chopped down the summary due to the "SHORT" part. A slightly more verbose summary is on its talk page. Cheers. =) --koder 07:44, 5 June 2007 (UTC)

Looks alright to me... Just depends what the BAG says now. E ^talk 08:23, 5 June 2007 (UTC)

Very interesting idea. Just a couple of points you might want to address as well:

Will the bot be issuing warnings to users when working old database dumps?
1. Not for the time being, unless it's explicitly desired by the committee. First, there's the whole issue with anonymous users running on dynamic addresses going, "huh?" and the ensuing deluge of complaints. Second, there's an exceptionally good chance that if the link is still there from the last DB dump, and if the link is subsequently edited by the bot, the spammer will simply assume that everything is going exactly as planned, so long as he isn't hovering over the page histories of the pages they spammed ages ago. Moreover, removing the affiliate GET variable from the URL does not affect the link in any noticeable way (i.e., it still goes where it's supposed to, but the spammer doesn't get any money for it). The reason for the edit to the page, of course, would be in the edit reason, so editors who are concerned as to why a bot is modifying links will be clear. Therefore, unless anyone objects, I was planning on simply squelching warnings on modifications done via db dump. If you'd like, I could make a list of the pages modified and stick them on a subsection of the bot's user page. Or, on the other hand, if you actually would like warnings for every page modification, that's also possible. Again, it's up to you. --koder 16:52, 5 June 2007 (UTC)
Thank you for the very detailed reply. Personally, I don't think warnings are necessary when the bot is working off of old dumps either. I just thought I would seek clarification on that point. Having the bot create a log in its userspace would be nice, if it's not too much trouble. That way, one could easily go through the log after a couple of days and check if the affiliate code has been readded. In that case, it's probably safe to assume that human intervention is required (in order to deal with a persistent spammer/explain WP:EL in detail to a new editor/etc). I suppose looking through the bot's contribution list would work just as well, but having a separate log should help if the bot takes on any additional tasks in the future and would make it easier to spot repeat offenders. -- Seed 2.0 20:34, 6 June 2007 (UTC)

Yes. However, I do have questions in response: I was originally planning on letting the bot queue updates to its user page and execute them all at once (ie, instead of making 10 edits for JoeSpammer's links on 10 different pages in one minute, only do one edit and add the 10 edits in the last minute all at once). Is this acceptable, or would you instead like the edits to the page to be instantaneous? Or, do you also want a larger timer on the updates (instead of every 1 minute, every 5, 60, or whatever minutes)? I should add that the bot would only be making edits to its page if there's actually something/someone to report action on; not continuously adding edits for no good reason. Again, it's all up to you guys. --koder 08:14, 7 June 2007 (UTC)
Will the bot be issuing sets of warnings (possibly problematic, in light of the fact that bug 9213 hasn't been fixed yet) or will it only issue one warning per user per $amount_of_time (eg. per day)? -- Seed 2.0 09:37, 5 June 2007 (UTC)
1. As it stands, any given user or IP would only be warned once for any given article on any given day. If the spammer then article hops after an ample period of time has passed for them to see the message (or not see the message, as the bug would imply), a second message will be generated similar to a level3 warning. After that, I would simply let the bot revert edits to any other pages they hop on without generating more warnings to the user page. The bot would then simply add the user/ip under a section for recent repeat offenders that it maintains. However, this can obviously be changed if it needs to be-- just let me know. =) --koder 16:52, 5 June 2007 (UTC)
Excellent. I assume the bot does observe WP:3RR in RC mode. Well, I'd love to see the bot in action. --Seed 2.0 20:34, 6 June 2007 (UTC)

Yes. Multiple reasons for this: first, in the interest of preventing people from simply messing with the bot to get a rise, it's better to limit the bot's net edits. Second, preventing DOS attacks against the server and the bot would dictate this approach, and finally and most obviously because the links themselves might be pertinent to the page they're inserted on, and are thus not de facto vandalism. More importantly, a user might be completely oblivious that he/she is inserting affiliated links, if, for example, they're infected with some nasty trojan or something. --koder 08:14, 7 June 2007 (UTC)

At the risk of being pedantic: you said you would let the bot revert edits to any other pages they hop on. Do you intend to let the bot revert edits that do not contain an affiliate code (as in, the bot reverts all edits by that user who effectively gets blacklisted)? I'm afraid I would have to object to that functionality. Reverting all edits by a repeat offender that do contain the affiliate code (ie. the bot follows the user's contributions to account for holes in the feed and reports the user), on the other hand, sounds fine. Sorry if I'm being a pain. -- Seed 2.0 20:57, 6 June 2007 (UTC)

Lol, not pedantic-- thorough and cautious; for, there are a lot of clueless developers out there who write seriously botched up code. Anyway, to answer your question: no, neither blacklists nor any form of them are part of the plan, especially considering that that would not be assuming good faith. However, that does give me a good idea for a possible future addition: should the bot go down for a period of time for whatever reason, it could cycle through the very recent offenders list to check for anything it missed while it was down. However, this is not planned as an immediate feature, so I'll clear it later if it is ever implemented. So, the bot's sole functionality would be dealing with removal of obvious affiliate code and tracking who adds it to where and when in order to let admins or other editors deal with it on their own. In a nutshell, the bot's purpose and function is to let humans worry about the merit/suitability/relevance/whatever of links on their own without inadvertently giving the spammer money as they do so. If spammers realize that specifically targeting certain pages with affiliate links isn't going to make them money, we should be able to reduce the overall amount of spam links that need to be reverted in the first place. --koder 08:14, 7 June 2007 (UTC)

Sounds great. :) If you need any help or a guinea pig to try the bot on, please don't hesitate to contact me. -- Seed 2.0 15:56, 7 June 2007 (UTC)