Wikipedia:Bots/Requests for approval/GurchBot
From Wikipedia, the free encyclopedia
- The following discussion is an archived debate. Please do not modify it. Subsequent comments should be made in a new section.
[edit] User:GurchBot
Between February and April this year, I made a large number of typo-fixing edits (approximately 12,000 in total). All of these were done manually – every edit was checked before saving – although I have written software similar to AutoWikiBrowser to assist with the process. This software is designed specifically for spellchecking and so, while not as flexible as AWB, has a number of advantages. It reports the changes made in the edit summary, can check articles very quickly (in less than a second), and can easily switch between different corrections (for example, "ther" could be "there", "the" or "other") in a way that AWB cannot. Central to this is a list of over 5000 common errors that I have compiled from various sources, including our own list of common misspellings, the AutoCorrect function of Microsoft Office, other users' AWB settings, and various additions of my own. As I mentioned, I have done an extensive amount of editing with the aid of this software, using my main account. I have recently made further improvements to the software; over the last couple of days I have made a few edits to test these improvements, and I am now satisfied that everything works.
While I believe Wikipedia is now so heavily used that (a) no one person could hog the servers even if they wanted to, and (b) the Recent Changes page is more or less unusable anyway, a couple of users have expressed concerns about the speed of these edits (which reached 10 per minute during quiet periods). Most notably, Simetrical raised the issue during my RfA. As I stated in my response to his question, I was not making any spellchecking edits at that time, but I explained that I would request bot approval should I decide to make high-speed edits in the future. That time has now come; I have created User:GurchBot, and I request permission to resume exactly what I was doing in April, but under a separate account. I will leave the question of whether a bot flag is necessary to you; I am not concerned one way or the other.
Thanks – Gurch 19:45, 15 July 2006 (UTC)
- As long as you are checking it yourself and ignoring the "sic"s, it seems good to me. Alphachimp talk 23:54, 15 July 2006 (UTC)
-
- Yes, I check every edit before I save it, and I ignore [sic] when I see it. I have incorrectly fixed a couple of [sic]s in the past because I (the falliable human) failed to spot them; one of my improvements has been to add [sic]-detecting to the software so it can alert me to this, and hopefully make an error less likely in future – Gurch 10:03, 16 July 2006 (UTC)
- I don't have any issue with this, provided you aren't doing any of the spelling corrections that tend to cause problems, such as changes from Commonwealth English to American English and visa versa. As long as it's only correcting spelling errors and doesn't touch spelling variations, it should be fine. I'd like to see a week's trial (which is standard) to get a good idea of exactly what will be taking place, and also for users to add their comments. A week's trial is approved, please report back this time next week. Essjay (Talk) 14:47, 16 July 2006 (UTC)
- I have never corrected spelling variations, regional or otherwise – being from the UK, I have long since given up and accepted all variants as equally permissible anyway. If you wish, I can upload the entire list and replace the (now out-of-date) User:Gurch/Reports/Spelling; I will probably do this at some point anyway. I won't be around in a week's time, so you can expect to hear from me in a month or so. For now, you can take this to be representative of what I will be doing – Gurch 16:11, 16 July 2006 (UTC)
If these are manually-approved edits, I wouldn't think approval as a bot would be strictly necessary, though I could imagine the speed might be a concern, especially if errors are (or were) slipping through. Given that this is more of a "semi-bot", I suggest it not be bot-flagged, so as to reduce the likelihood of errors going undetected subsequently as well. Alai 04:24, 18 July 2006 (UTC)
- In fact approval as a bot wasn't necessary – as I mentioned above, I used to do this using my main account, and would have continued to do so, except that a number of users expressed their concern and suggested I request approval for a bot. So I have done that. I freely admit that errors will inevitably slip through at some point; in fact, I've just had to apologize for correcting a British spelling which was, fortunately, spotted and reverted very quickly. Of course I never intended to do any such thing – it turns out that this (actually correct) spelling has been listed on Wikipedia:Lists of common misspellings/For machines (one of the sources for my correction list) since November 2002; somehow it was never spotted in nearly four years. My fault, of course, for assuming the list was correct; I'm now scrutinizing my list thoroughly to avoid repeating this mishap. This is the first time I've made such a miscorrection, the reason being that my old list was constructed by hand, whereas I've now tried to expand it (and so catch more errors with each edit) by including lists from other sources. In the past I have occasionally mis-corrected "sic"s and errors in direct quotations; the chance of this should be much lower now that my software can detect these itself, even if I miss them. Based on what I have done to date, though, I reckon my error rate is about 1 in every 1000 edits, which I can live with – Gurch 11:38, 18 July 2006 (UTC)
- As I said above, you're cleared for a month-long (instead of a week, at your request) trial; check back with us then and we'll set the bot flag. Essjay (Talk) 00:47, 19 July 2006 (UTC)
Concern The list of common mispellings is utter shit, please do NOT use it. It replaces many words that are actually words. --mboverload@ 20:34, 28 July 2006 (UTC)
-
- I'm not (any more). I've found some more reliable lists from other sources. I would use your list, but it would take forever to strip the regexes out. (I did think of implementing regex support in my software, but I think it would slow it down too much) – Gurch 18:47, 14 August 2006 (UTC)
- The overhead of regexing is modest compared with d/ling the page, uploading the changed text, d/l the diffs, uploading the final changes and d/l the final page. Rich Farmbrough 21:05 15 August 2006 (GMT).
- I agree. That's why I generate diffs locally, and (with the help of the IRC RC feed) download the next page the moment the changes have uploaded, before the final page starts loading. Thereby reducing those five operations to two – Gurch 13:28, 24 August 2006 (UTC)
- The overhead of regexing is modest compared with d/ling the page, uploading the changed text, d/l the diffs, uploading the final changes and d/l the final page. Rich Farmbrough 21:05 15 August 2006 (GMT).
- I'm not (any more). I've found some more reliable lists from other sources. I would use your list, but it would take forever to strip the regexes out. (I did think of implementing regex support in my software, but I think it would slow it down too much) – Gurch 18:47, 14 August 2006 (UTC)
-
-
-
-
- The 1500 regexes AWB uses only takes a fraction of a second to process on even fairly large pages, though the first time is a bit slower, i assume that is when they are compiled. Martin 13:42, 24 August 2006 (UTC)
-
-
-
OK, one month (and a bit) trial period is up. I have made just over 1700 edits with the account. Barring a few small errors, which were the result of me not concentrating, and were corrected soon after, everything seems to be fine. I await your final decision – Gurch 13:34, 24 August 2006 (UTC)
- Yes, approved. thanks Martin 13:42, 24 August 2006 (UTC)
- The above discussion is preserved as an archive of the debate. Please do not modify it. Subsequent comments should be made in a new section.