Wikipedia:Bots/Requests for approval/DragonBot
From Wikipedia, the free encyclopedia
[edit] DragonBot
tasks • contribs • count • logs • page moves • block user • block log • flag log • flag bot
Operator: Wizardry Dragon
Automatic or Manually Assisted: Automatic and Manually Assisted. See explanation below.
Programming Language(s): Managed C++ (.NET)
Function Summary: Edit mining bot/clerical bot.
Edit period(s) (e.g. Continuous, daily, one time run): Variable. See below.
Edit rate requested: Variable; infrequent minor edits. See below.
Already has a bot flag (Y/N): N/A
Function Details: dragonBot's purpose is simple: currently, when a user is brought before the administration on Wikipedia, the administration involved has to go through the user's history themselves, by hand. This is a tedious task when long-standing editors are brought forward for action, and dragonBot is built towards automating and expediting this time-consuming task.
On a base level, dragonBot is a search engine. It searches through a user's history on Wikipedia, and returns results based on the search criteria - number of edits, location of edits, and so forth. However, dragonBot is more complex than a simple search engine. It's heart is a powerful heuristic that analyzes the edits a given user makes to analyze their value. This heuristic is useful in rooting out vandals and other destructive editors on Wikipedia, as well as vindicating users wrongly accused of such activities.
dragonBot has both automatic and on query capabilities. As part of the anti-spam effort it reports statistics on users that trip anti-spam alerts on the Linkwatcher bot, the same feed as Shadowbot uses. In this automatic capacity, it only infrequently edits Wikipedia. On demand, the bot can be made to look over a page via the IRC interface.
Whenever dragonBot is analyzing a page, either given by a user for analysis via the IRC interface, or when automatically tripped by a Linkwatcher alert, it may also perform clerical tasks ("house keeping") on a page while it is analyzing. Mostly, this is MOS formatting tidying - proper sections and so forth. It will also tag pages that are overly long, have a disproportionately long intro, consist mainly of lists, and other such easily identifiable problems. It will also leave messages on an administrator's page requesting SPROTECT or FPROTECT for pages seeing edit wars, as well as alerting in IRC.
[edit] Discussion
- It should be noted that this bot is still in pre-alpha development and is not at all feature complete. Betacommand advised me to post the BRFA here now and said it wouldn't matter if I was still programming, so here she is. Feedback and suggestions for functions are most appreciated. ✎ Wizardry Dragon (Talk to Me) (My Contributions) (Page Moves) (Support Neutrality on Wikipedia) 21:46, 16 November 2006 (UTC)
-
- This is a compelling idea, but one which definitely needs to be hashed out more. What you'll need for approval is a full list of the heuristic rules to verify their accuracy. However, since this bot will be generating a page on demand and that page will presumably be placed in a new article in a standard location and not modifying existing articles, then the risk to Wikipedia is minimal. As for the final paragraph in your description, those "additional" tasks will require separate testing since they will be affecting actual articles. As for the details of this search bot, how will it handle accounts such as mine with over 50,000 edits? I would expect it to stop after a reasonable limit so there isn't too much load on the server. So in summary: Provide more details and when you have an idea mostly completed perform an example or two (manually is preferred) and post the diffs here so we can see how it works. -- RM 13:29, 29 November 2006 (UTC)
-
-
- If you have some questions that would help hash it out for myself and yourself, feel free to ask them, and I'll more than happily respond. As to the analysis and editing, dragonBot is designed with the idea of distributed computing in mind - specifically, there will be two or three bots running at a given time - each with a different task. One bot would be solely dedicated to Wikipedia editing, one to the heavy number crunching, and another acting as the "server", delegating the task requests and their component tasks to the subordinate bots (this helps distribute the workload for intensive processes such as analyzing editors with large edit counts). So, only one would actually be "logged in" and editing as "dragonBot". I think that's fairly clear but if you have questions on that, feel free to ask. ✎ Wizardry Dragon (Talk to Me) (My Contributions) (Support Neutrality on Wikipedia) 18:37, 29 November 2006 (UTC)
- See this[1]; I'd be welcomed to share (both ways) any heuristics ideas. If you are XML scanning each diff, then that will not only drag the servers (each diff requires 2 revs and parsing), but it will take forever. At any rate, how will admins interect with this? Will the bot have its own website (kind of like Essjay bot can refresh the sandbox from commands to another web site) or will it be on wiki, looking for someone to edit some page to trigger it?Voice-of-All 20:07, 29 November 2006 (UTC)
- I'm open to interfacing ideas on that case, it's something I've given a lot of thought to without what I deem to be a satisfactory conclusion. For now it has an IRC interface, but for a bot working on wiki, it would be nice to come up with some sort of "on wiki" interface - I'm just not sure how to implement it. Since it has a module for RC feed, I could look in RC feed for requests on it's talk page, however it would most likely require human intervention to respond to the requests.
- Regarding server strain, this has been a primary concern to me. It is easy to program the bot to go through each and every diff and then perform a heuristic calculation, however I would not consider the server strain acceptable in such a method. I have considered sampling methods, but I'm not sure how to go about that; I do not want the results to be inconsistent across requests, so sampling is something that needs a lot of thought and care.
- As to sharing heuristics, this isn't a problem for me. dragonBot will be released under the GPL, so all the source will be open and freely available.
- ✎ Wizardry Dragon (Talk to Me) (My Contributions) (Support Neutrality on Wikipedia) 20:56, 29 November 2006 (UTC)
- OK, the IRC interface thing looks OK, as long as it either does not request every diff or page for a user edit on wiki (this could be done on a database dump though). What channel would this go on? Also, the tagging still seems unclear. How does it follow pages or know which ones to follow?Voice-of-All 23:14, 1 December 2006 (UTC)
-
- It would do so on request, I would suppose. I'm considering just using a database dump, and then simply updating it as necessary when the bot needs newer data. The one thing I'm having some difficulty with is making sure the bot knows when it needs new data. ✎ Wizardry Dragon (Talk to Me) (My Contributions) (Support Neutrality on Wikipedia) 01:24, 2 December 2006 (UTC)