User talk:Grammarbot

From Wikipedia, the free encyclopedia

Questions and comments about my bots should go to User talk:Humanbot or User talk:Grammarbot

If a bot is going crazy, please leave a note at User talk:R3m0t as well so that I notice sooner
Anything not to do with my bots should go to User talk:R3m0t

Please add new discussion to 'Run on " ," #1' or 'Random stuff'.

Contents

[edit] Prerelease

[edit] Discussion from Wikipedia talk:Bots

I've made a list of articles which have spaces before commas (6518 articles). I think maybe a bot could do this:

  1. Every 10 seconds (or whatever the load limit is meant to be for bots) correct one article, move article from "todo" to "tocheck"
  2. 5/10/20/60 (?) minutes later, check back at the correction
    • If it was reverted, take a note of it and move the article from its "tocheck" to "tosee"
      • Bot author looks at "tosee" and attempts to code in any special cases
    • If it wasn't reverted, delete the article from the "tocheck" list (or move article to "done" list)
  3. Continue.

I've also made other lists:

  • "Space before colon" (7740 articles, main namespace only)
  • "Space before exclamark" (1586) (many false positives involving table syntax)
  • "Space before fullstop" (11489) (many false positives involving ellipsis and TLDs)
    • "Space before fullstop which is before space" (2988) (less false positives)
  • "Space before qmark" (4523)
  • "Space before semicolon" (7740) (many false positives involving assembly language code and definition lists)
  • "– instead of –" (1094)
  • "— instead of —" (1496)

What do you think? Alternatively, I could post some reports (about 45 pages with 100 articles per page which provide context around the error, similar to the repeated words reports I made [1]) r3m0t 11:26, Feb 21, 2005 (UTC)

What about articles which have commas in the title? What if the title was intentionally that way? -- AllyUnion (talk) 03:56, 24 Feb 2005 (UTC)
Probably it would ignore the title. r3m0t 07:16, Feb 24, 2005 (UTC)
As I said, it would also check back and if the change was reverted (i.e. a "space comma" sequence is back in the article again) it would put that (article and context) on a list for the bot author to look at. If the space comma was some sort of vandalism revert, the bot author would revert back to the valid version manually. Otherwise, the bot author would investigate the special case so that this false positive doesn't come up again. Also, they might want to put the article on a permanent "exclude" list or, in extreme cases, stop the bot. r3m0t 10:15, Feb 24, 2005 (UTC)
Stop the bot: That is if there are any wierd uses of the comma I don't know about which are a major thing and can't be worked around. Of course, if coding the false positive will take long, the bot will be stopped until the coding is done. r3m0t 17:29, Feb 24, 2005 (UTC)
There is still an argument over the whole ndash and mdash issue. I would avoid that. Also, where is this list going to be? -- AllyUnion (talk) 10:42, 24 Feb 2005 (UTC)
Are you sure the argument is about what entity to use and not where to use ndash and where to use mdash? See Wikipedia:Manual of Style (dashes). It doesn't expressly say anything about whether to use the numerical entities or the named ones, but the named ones are "obviously" simpler (from the source anybody can tell it's some sort of dash, although 'n' and 'm' mean little). The list would be of course on the computer running the bot, but if you want it would also be on the wiki, whether updated whenever a change occurs or (eg) every few hours. IMO It would be a waste of Wikipedia's space. May I begin to develop and run the bot under User:R3b0t from Sunday? Would you be willing to give it the bot flag? Is it fine to run it (slowly) without a bot flag? Is a vote about allowing the bot necessary? r3m0t 17:11, Feb 24, 2005 (UTC)
Can you make the list public on a website that isn't the Wikipedia? -- AllyUnion (talk) 16:33, 25 Feb 2005 (UTC)
Yes. The list will be in a MySQL database, so I can write some scripts to show the list. r3m0t 16:44, Feb 25, 2005 (UTC)
Am I getting permission to run this? I will try to run it to do one change a minute tomorrow. I will provide a facility to stop the bot (off-site) and will provide a link from here. Of course, you can also ban User:R3b0t. Is that name too similar to mine? r3m0t 00:15, Feb 27, 2005 (UTC)
Yes, it kind of is. -- AllyUnion (talk) 08:16, 27 Feb 2005 (UTC)
*twirls* What do you think? Grammarbot 10:42, 27 Feb 2005 (UTC) (Yes this is mine) r3m0t 12:57, Feb 27, 2005 (UTC)
That's coming from User:AllyUnion who used to have a bot at User:Allyunion! Only joking. r3m0t 15:29, Feb 27, 2005 (UTC)
Well, User:Allyunion is blocked now. I suggest you make your edits, and produce some kind of log every day. -- AllyUnion (talk) 16:01, 27 Feb 2005 (UTC)
Thank you. I'm programming it. r3m0t 17:16, Feb 27, 2005 (UTC)

The bot is now running making one change a minute without the bot flag. This is the log for today and tomorrow (with some earlier entries removed as I have since changed the log format) and various things are at stuff.php, including a list of upcoming articles, stuff that were fixed and stuff which it thinks were already fixed. There is also a counter showing the time until the next run. Note that currently it does not go back to check its edits. However, the backlog of changes it has made remains (in the database). If it were to check 2 a minute, it would catch up in - ooh, a day? Of course, I might implement that feature sooner, in which case, all the better. r3m0t 22:11, Feb 27, 2005 (UTC)

Um, do you think you could at least add your bot to the list of bots running without a flag? I only spotted it because I was scanning RC. --Tony Sidaway|Talk 22:36, 27 Feb 2005 (UTC)

There were problems with some characters which screwed up many tags and tables. Unfortunately, grammarbot did about 100 edits before I noticed and stopped the bot. Not all of these edits were problematic, but for simplicity all were reverted in an hour or so. I'm waiting to recieve (conditional?) permission to run it again, after (some of?) the things detailed on User talk:Grammarbot have been done. r3m0t 01:16, Feb 28, 2005 (UTC)

It's up again without any problems (not even at Anglesey ;)) and I hope to get the bot flag in a few hours or so. I'll apply. I'll also list it on Wikipedia:Bots. r3m0t 22:35, Mar 3, 2005 (UTC)


Which of these should I use next? What about after that? Which of these should I definately not do?

  1. "Space before colon" (8066 articles, main namespace only)
  2. "Space before exclamark" (1710) (many false positives involving table syntax) (can exclude tables, however need to put a bit more work in for recognition of nested tables)
  3. "Space before fullstop" (12372) (many false positives involving ellipsis and TLDs)
  4. "Space before fullstop which is before space" (3160) (less false positives)
  5. "Space before qmark" (3424)
  6. "Space before semicolon" (912) (many false positives involving assembly language code and definition lists)
  7. "– instead of –" (1617)
  8. "— instead of —" (1177)
  9. "“ instead of a straight double quotation mark" (1450) (total of named entity, number and hex)
  10. "” instead of a straight double quotation mark" (1464)
  11. "‘ instead of a straight single quotation mark" (799)
  12. "’ instead of a straight single quotation mark" (3562)
  13. "euro character instead of HTML entity" (53298) (assuming HTML entity is preferred as that is what the insert box puts in - compatibility problems?)
  14. "double space which is not after a full stop" (100617) (old data)
  15. "ampersand before space" (i.e. invalid XHTML) (29329) (old data)
  16. "0x91 (145) (MS left single quotation mark) instead of a straight single quotation mark" (53298) (see [2] for info on these)
  17. "0x92 (146) (MS right single quotation mark) instead of a straight single quotation mark" (53298)
  18. "0x93 (147) (MS left double quotation mark) instead of a straight double quotation mark" (53298)
  19. "0x94 (148) (MS right double quotation mark) instead of a straight double quotation mark" (53298)
    Clearly something went wrong here. Maybe I'll try to get it right some other time. r3m0t talk 22:54, Mar 15, 2005 (UTC)
  20. "Space before comma" (1062)

The comma thing is due to finish in a few days. Note that if you think a human is needed to do any of these, I can create some very nice reports easily. r3m0t 17:12, Mar 4, 2005 (UTC)

Well, HTML Tidy converts naked ampersands to &, so there's no need to fix those (15). I would avoid 2, 3, and 6 until you can figure out a way to weed out some of the false positives. 14 seems superfluous. People put double spaces in for clarity, and it's not something that shows up in the output anyway. 1, 4, and 5 will get some programming false positives, but I don't think very many. My suggestion is to go with the ones with very few false positives first: 7–13, then move to the ones with less false positives. Maybe you can develop some methods of weeding out the false positives in these other items.
My other suggestion is to find more numbered entities along the lines of ndash and mdash and convert those to their named equivalents.
– flamurai (t) 01:33, Mar 5, 2005 (UTC)
Weeding out the table syntax for 2 is very easy, but I have to do it when I'm running the bot, not in the original SQL query that gives us that count. I think 7 and 8 will have less problems than 9-12 (as proper quotation marks may be used in the articles about quotation marks and about typesetting) so those will be first. I will do 13 after that.
What bot is running HTML Tidy? r3m0t 09:24, Mar 5, 2005 (UTC)
MediaWiki runs HTML Tidy. View source and edit the page on this ampersand: & – flamurai (t) 09:50, Mar 5, 2005 (UTC)
If that is so, then are all those & characters I found inside math tags? Oh, you mean that it keeps the source the same, but sends it properly! Good. I will look around about converting other entities. r3m0t 10:25, Mar 5, 2005 (UTC)
Numbers 9–12 have little or no obvious benefit to outweigh the risk of false positives; indeed past bot-based changes to Windows-1252 quotes (Guanabot) have stuck to entities as a bot safety issue. Susvolans (pigs can fly) 17:26, 7 Mar 2005 (UTC)
Well, they make the source text more readable. I don't understand what this "bot safety issue" is, but I will of course avoid these until your explanation. r3m0t 18:34, Mar 7, 2005 (UTC)

The full list of entities is here. We are using XHTML 1.0. r3m0t 10:33, Mar 5, 2005 (UTC)

[edit] Recieved bot flag

Grammarbot is now marked as a bot on this Wikipedia. If you ever need this flag removed, just ask at m:requests for permissions. Angela. 12:03, Mar 5, 2005 (UTC)

[edit] First run

[edit] The panic

I'm assuming that this is not a deliberate vandalbot, but in its effects at Adrian Nastase, it might as well have been. Among other things, it is systematically screwing up HTML entities. -- Jmabel | Talk 22:36, Feb 27, 2005 (UTC)

It also screwed up the formatting on Anglesey requiring a revert.
Velela 23:04, 27 Feb 2005 (UTC)

My sincerest apologies. The bot is now stopped. r3m0t 23:13, Feb 27, 2005 (UTC)

Err... incredible. Is there any way to revert all these automatically? Is this the death knell of my bot? r3m0t 23:14, Feb 27, 2005 (UTC)

Fuck. Fuck. Fuck. Fuck. Didn't Angela have a bot against this? (Yes, I really did stop the bot by now.) r3m0t 23:17, Feb 27, 2005 (UTC)

No, she didn't. r3m0t 23:22, Feb 27, 2005 (UTC)

It's only about a hundred pages. I'm pretty sure they're being manually reverted as we speak (I did one. :-) And after all, some edits were probably completely uncontroversial. Hey, worse things happen. It's not the Willy on Wheels. :-) 82.92.119.11 23:23, 27 Feb 2005 (UTC)

I've done (from the most recent) up to American Association for the Advancement of Science. Phew. r3m0t 23:32, Feb 27, 2005 (UTC)

I wish I could help out, but I desperately need to catch some Z's. Try soliciting some brute force on the IRC channels. An admin (there are always some on the channels) may even have a bot handy for such things. 82.92.119.11 23:40, 27 Feb 2005 (UTC)

[edit] Problems

Problems, then:

  1. It removes the ampersand in entities such as & or ²
  2. It removed double quotes and maybe 'single' quotes too
    Err... in only some cases?
  3. It removes < and > and possibly other things which are escaped in the edit box such as " &

Sorry again. I can't imagine why this happened. Testpage at User:R3m0t/Sandbox and will be tested properly before re-enabling. r3m0t 23:13, Feb 27, 2005 (UTC)

If this ever runs again, please consider having it ignore everything between <math> tags, since that is often formatted for ease of reading while editing. Ben Cairns 00:05, 28 Feb 2005 (UTC).

Don't worry, I'm not dead yet. I'll try to get that in. Grammarbot 00:07, Feb 28, 2005 (UTC)

[edit] For release #1

This appears all fixed. HOWEVER:

  1. There is now an extra pageload, bumping the amount of pageloads up from 2 (edit and submit) to 3 (get text with Special:Export, edit and submit) and I really want to move this back down to 2 (the minimum)
    If I can't, I'll at least move the amount of pageloads for something which is already fixed back down to 1. That's easy.
  2. I want to ignore things in math tags. I will use what I already have and possibly load more pages than I need to.
  3. I want to use a setting to enable/disable the UTF8 conversion which I now use, instead of hardcoding it in. (Possibility of outreach to other wikipedias) no longer needed
  4. I would like to test it on everything which it had already done and check the diffs "by human", to make sure everything is fine.
    Nah.
  5. I need to re-recieve permission.
    Nah.
  6. I need to make publicly available something to close the bot down.
  7. I need to move this down to every 5 minutes instead of every minute.
    Nah.
  8. Perhaps it will be able to run on other reports making changes such as ' . ' -> '. ' and '.A' -> '. A' (for A-Z and only uppercase to avoid TLD problems)

Yours, r3m0t 01:09, Feb 28, 2005 (UTC)

Thanks Grammarbot this time it seems fine and Anglesey has survived the experience. Velela 22:26, 3 Mar 2005 (UTC)

*smiles* My pleasure. Grammarbot 22:31, Mar 3, 2005 (UTC)

[edit] Run on " ," #1

[edit] Full list of reversions

I will check these one by one and add explanations. From this we shall see what exceptions may need to be coded in. Feel free to update this list by pasting in new items as they show up here, but please leave tme to the analysis. r3m0t 00:09, Mar 6, 2005 (UTC)

Now it should remove about three spaces before a comma. I just call the same fixing function three times. r3m0t 13:31, Mar 6, 2005 (UTC)

[edit] Wikilinks

What's the policy on wikilinks? Communes of the Nièvre département removed a space from Asnois ([[Asnois , Nièvre|Asnois]] --> [[Asnois, Nièvre|Asnois]]), which is ok (good, even) as it was a red link; but if it hadn't been…yikes! Joestynes 06:11, 4 Mar 2005 (UTC)

I can't imagine why there would be a space before a comma in a link (or, indeed, almost anywhere). Anyway, I guess it would have made the change, been reverted, and I would go to check it. r3m0t 07:24, Mar 4, 2005 (UTC)

[edit] Ellipsis

What's proper in a finite list after an ellipsis: x1, x2, x3, ... , xN or, after Grammarbot, without a space before the comma x1, x2, x3 ..., xN? Not sure there's any difference displayed after cdot in math markup. Sorry if my ignorance wastes any time. --Eddie | Talk 13:53, 4 Mar 2005 (UTC)

How am I meant to know? I think that the second looks better. Anyway, if you put it in math tags Grammarbot won't correct it. r3m0t 14:34, Mar 4, 2005 (UTC)
I think with space is proper (for values of proper involving TeX anyway). Gruepig

[edit] Need to fix something

I noticed in exponentiation the bot changed something of the format "a ,b" to the format "a,b" when it should have changed it to "a, b". Cheers. CryptoDerk 14:42, Mar 4, 2005 (UTC)

Hmm... not so sure about that. What about numbers? r3m0t 15:05, Mar 4, 2005 (UTC)
Yeah, I don't think it's necessary for the bot to add spaces where it thinks there should be. The exponentiation notation "a,b" is not necessarily wrong, anyway. --DropDeadGorgias (talk) 20:05, Mar 4, 2005 (UTC)

[edit] Yea, Grammarbot!

I figure you'd get a lot of dings about what didn't work. I'd thought I add at least one "attabot" for the many more that worked fine. I noticed about a dozen. Thanks. --A D Monroe III 21:45, 4 Mar 2005 (UTC)

[edit] space before commas

FYI: Grammarbot found and fixed the space before the comma in Japanese New Year, but only found, but did not fix, the space before the comma in Japanese poetry. BlankVerse 08:42, 5 Mar 2005 (UTC)

That's... odd. I wonder why. Anyway, if in an hour it still hadn't been fixed, the article would go on the -2 list here and I would have looked at it. r3m0t 10:26, Mar 5, 2005 (UTC)

Also: I am wondering if it might be worth creating a page listing all the articles where you've wfound problems that needed correcting. The reason I am suggesting that is that I've noticed that when someone has gone through specific common errors on an article page that is in my watchlist, that is a good indication that there is probably other errors on that page, and a quick spell-check (I use the SpellBound extension in the Firefox browser) usually finds 3-5 more spelling errors on those pages. On the other hand, someone who was interested in following in the wake of the grammarbot looking for spelling errors could also just use the "User contributions" link. BlankVerse 08:42, 5 Mar 2005 (UTC)

Well, I have a database of all these mistakes, so I can run a spellchecker on those articles if I like. Unfortunately, there are difficulties in running a spellchecker on Wikipedia text, including acceptance of regional spelling variations, masses of technical and foreign terms, latin phrases etc, and the inclusion of many rare proper nouns of names and places (Weebl and Bob anyone?) I could make it dump all the article texts to files, but it would take a long time. (One article per minute, and I would need to catch up on the backlog of about 1400 articles and growing) r3m0t 10:26, Mar 5, 2005 (UTC)

[edit] Coughing on one article

Medical_analysis_of_circumcision has been failing for ages (see end of today's log). I ought to make it give up eventually and go to the next article, I suppose. I'll investigate. r3m0t 22:42, Mar 5, 2005 (UTC)

Maybe I'm banned! Grammarbot 22:46, 5 Mar 2005 (UTC)
Well, the page is protected. No wonder. I'll try to be a bit clever about that. Grammarbot 22:49, 5 Mar 2005 (UTC)
Thank you. (: Protected pages are now status -4 in the database. r3m0t 23:19, Mar 5, 2005 (UTC)
Pages which were actually not possible to recieve were marked as protected! No worries, it's fixed now. I think. r3m0t 11:52, Mar 6, 2005 (UTC)

[edit] ASCII art

Careful with ASCII art there! Today Nerd Boy article got vandalized by this bot. There should be a check probably whether a line has a leading space to prevent further incidents like that. Grue 07:02, 6 Mar 2005 (UTC)

Looking at the way my bot is programmed at the moment, that's somewhat difficult. Also, if you look at the list above, there are plenty of instances where this bot messed up preformatted text. I'll try to make it able to pass over not just math tags but also pre tags and pre lines. Note that there may be many instances in which there was just one space before the comma and the bot was not reverted and it therefore doesn't come up in the list. I'll leave it to your discretion whether to turn the bot off or not. (Try dividing the number of articles above in which it messed up preformatted articles by the -3 count at the bottom of this page, and multiplying that by the NULL count on the same page to get an idea of how many more articles this problem will affect.) r3m0t 11:30, Mar 6, 2005 (UTC)
Don't worry, it's definately fixed now. There is still a small backlog of articles it has fixed incorrectly. r3m0t 13:06, Mar 6, 2005 (UTC)

[edit] Double Edit

You appear to have left two conflicting messages on my talk page within ten minutes, I'm not sure which of the two you intended to leave. Would you mind awfully removing one because I'm not sure which you mean to tell me, thanks. Although I do admit that there is one message I didn't remove, I had to leave in a hurry, sorry. :). Rje 04:52, Feb 10, 2005 (UTC)

P.S. If you intended to leave me the second message: I don't need a gmail invite, but thanks very much for offering. Rje 05:00, Feb 10, 2005 (UTC)
I've reread the instructions, I now realise what I was doing wrong and have sorted it out. Sorry about the mix up. Rje 13:09, Feb 10, 2005 (UTC)

[edit] Summary of change

Can grammarbot please say what it is changing in the edit summary?

"[[insect]]s , including" --> "[[insect]]s, including". Removed space before comma. I am a bot. Please revert my change if it was incorrect. I will notice automatically.

You can't even see what it has changed in the diff without careful scrutiny, since there is nothing to turn red. This will prevent us from having to view the diff at all... - Omegatron 16:09, Mar 6, 2005 (UTC)

I'm sorry I didn't get it done this run. I will get it done before the next run if it's relevant. (For cases like changing numbered entities to named I don't think this is necessary.) r3m0t 18:05, Mar 7, 2005 (UTC)
You mean like changing "multiply by 2" into "multiply by two"? I hope you have summaries turned on before you do that... - Omegatron 19:08, Mar 7, 2005 (UTC)

[edit] should avoid pre and code

The grammarbot was blocked earlier today by User:CSTAR, presumably because of Poincaré-Birkhoff-Witt theorem. However, this was not an error: the bot changed "x ,x" → to "x,x", which is not worse than the original, although the full manual fix would be "x, x".

I have unblocked it. However, the grammarbot should avoid anything within <pre> ... </pre> and <code> ... </code>, because within these the spacing is significant (ASCII art etc).

-- Curps 19:26, 6 Mar 2005 (UTC)

It does. The part in Poincaré-Birkhoff-Witt theorem was not in any such tags. On one false positive (which wasn't even false) CSTAR decided to block it. That's damn annoying. Apparently my bot went berserk.
Of course, if there really are problems, CSTAR can please use the page I provided to turn it off, which prevents it from going screwy. Please provide examples.
I'm sorry for the bile but I was hoping to finish this run a little earlier and have a wikiparty (is that a new word?). r3m0t 20:43, Mar 6, 2005 (UTC)

Well, it was just that I noticed some of the earlier edits at ASCII art did edit within pre and code, so I mentioned that, but presumably you fixed it along the way, after the early runs. Yes, the part in the PBW theorem page was not in any such section (sorry if my phrasing was not clear) and Grammarbot's edit to it was not an error.

Anyways, I did unblock it.

-- Curps 20:58, 6 Mar 2005 (UTC)

Yes, thanks. Incidentally, next time remember to check for IP blocks. No worries, Raul654 was on IRC so he was able to help. :) I feel a bit stupid that I hadn't thought of such exceptions when I concieved (as in idea, not baby) the bot, and again when somebody requested it to exclude math tags. Actually, I think by the time I'de fixed that it was about half-way through. r3m0t 21:07, Mar 6, 2005 (UTC)

[edit] Perfect copy editing

Hi, I've noticed grammarbot has edited a few articles on my watchlist. This is just to let you know that he (or she?) did very well: no mistakes. SlimVirgin 05:06, Mar 7, 2005 (UTC)

It. Definately it.
Funny, I was expecting people from maths, physics, chemistry and linguistics to suddenly burst into outrage (although I didn't know exactly what the problem would be ;)). Looks like only maths has been the problem. r3m0t 07:28, Mar 7, 2005 (UTC)

[edit] One thing grammarbot gets wrong

In mathematics articles, if one writes about an inner product < , >, obviously it would be wrong to change it to <, >. Michael Hardy 00:49, 7 Mar 2005 (UTC)

As a 15-year-old, although very interested in mathematics, this is not obvious to me in any way whatsoever. I wish I could put <...> on the exclusions list like code tags, but I need to make more changes otherwise the matching on math tags will stop working. I have 15 minutes to program this before school. Please wait. If anybody thinks this is a serious problem, please go to my page to stop the bot. r3m0t 07:18, Mar 7, 2005 (UTC)

Excellent, my testcase works. Into the main code my code goes! That was fast. :) r3m0t 07:25, Mar 7, 2005 (UTC)

In TeX, one would write about an inner product \langle\ ,\ \rangle or perhaps \langle\bullet,\bullet\rangle. The point is that there are two blank spaces in which arguments to the function may appear. I think this occurs sufficiently rarely in non-TeX mathematical notation that it's not a major problem, but one can be a bit touchy about such things after seeing some attempts to "fix" punctuation in mathematical notation that did not need fixing. For example, changing [ab) to [ab] or to (ab). Michael Hardy 22:12, 7 Mar 2005 (UTC)

[edit] Random stuff

[edit] I

"I will notice" this thing is not a person, loose the I and stop anthropmophizing(sp?) about your software.--Jirate 16:00, 2005 Mar 5 (UTC)

I'm not the first to do this; see User:AngBot. There is also a page somewhere about never having personal attacks against Angela which I will find another time. r3m0t 16:49, Mar 5, 2005 (UTC)
It's a slipery slope.--Jirate 23:31, 2005 Mar 5 (UTC)
What comes next, then? r3m0t 23:51, Mar 5, 2005 (UTC)
You stop giving the machine instructions, and move on to vague hints, soon your a web designer.--Jirate 23:58, 2005 Mar 5 (UTC)
Web designers give precise instructions; it's just that some browsers don't follow them. r3m0t 00:03, Mar 6, 2005 (UTC)

[edit] The name

Why is it called Grammarbot when it checks punctuation, not grammar? --Angr 22:28, 5 Mar 2005 (UTC)

Punctuation is grammar. On the other hand, this will be fixing HTML entities next... r3m0t 22:29, Mar 5, 2005 (UTC)

Neither punctuation nor HTML entities have anything to do with grammar. Now if you had made a bot that could fix dangling participles, sentence fragments, or subjacency violations (things like That's the man who I don't know whether took Martha to the dance last month), that would be a Grammarbot! --Angr 14:59, 6 Mar 2005 (UTC)

[edit] I is for Initiative

Thanks for correcting a comma, but if you don't mind, could change the bot's comment to the third person, please. I seriously doubt the program is doing all this, and giving that comment, on it's own initiative. Rather, I expect you wrote the program and comment, and commenting from your own viewpoint would express better that you're doing this, with the bot as your tool. That would connect a person to the changes, making them less of an irritation. (I do hope the bot is limited to space-comma after alfanumerical characters, as space-comma after interpunction is usually intentional. Likewise, that it will only drop the space when there's already a space behind the comma.) Aliter 16:49, 7 Mar 2005 (UTC)

It doesn't have initiative, but it does check back one hour later to check if the change was reverted. The reverted list is -2 on that page. I will place a section on the bot's User page for the edit summary to be edited. It will make different corrections next run. r3m0t 17:27, Mar 7, 2005 (UTC)
Grammarbot's next project should be correcting placement of apostrophes ;) Alexforcefive 19:31, 18 December 2005 (UTC)

[edit] Fan Mail

Yeah!!! Grammarbot rules!! What a pal. You're just like Bender. You mess things up, but you get 'em right sometimes too, and having bots around adds needed diversity to this human colony of typists. Uris 05:09, 8 Mar 2005 (UTC)

[edit] Ideas

[edit] Reports

Hey, I have no idea how you do these thingies, but I think there's a lot of work to be done on Wikipedia in catching and fixing a very specific type of punctuation mistake -- not including punctuation inside quote marks. For instance, the proper way to write punctuation in quote marks is: "The general is quite naked," he said ... not like this: "The general is quite naked", he said. Is there some way to set up something like your other reports to check for that? I see it a LOT just trolling random pages for copyedits. Katefan0 22:16, Feb 9, 2005 (UTC)

  • It also happens a lot with periods, obviously. Any sort of punctuation really -- comma, period, exclamation point. Katefan0 22:18, Feb 9, 2005 (UTC)
I think you might be wrong there. I'm going to check the Wikipedia:Manual_of_Style#Quotation_marks page. Basically, I downloaded the whole text of current versions from [3] and put them into a MySQL database (the type used on the MediaWiki software). I then wrote a script in the PHP programming language to ask the MySQL database where these words existed twice (it searched for " the the ") and displayed the results nicely. Finally, I wrote another script to take that list and check it against the current version using Special:Export. (The database dump was a month old at the time. I will have to download the new dump today. New dumps were provided for every project except en.) r3m0t 22:23, 9 Feb 2005 (UTC)
  • Mmm, looks like you are right in part, I guess here usage is conditional -- I'm a reporter and AP style is that punctuation always goes inside the ". Anyway, just a thought! Thanks. Katefan0 22:41, Feb 9, 2005 (UTC)

[edit] Suggestion of item to use bot to correct

As much as my typographer side hates me for suggesting this, for uniformity, it would be good to change all the correct quote marks to bastardized computer quote marks. i.e. seek out and destroy &ldquo; &rdquo; &lsquo; &rsquo; and their numerical equivalents.

You could also generalize the mdash/ndash thing to replace numerical entities with the equivalent (widely supported) named entities.

– flamurai (t) 15:24, Mar 4, 2005 (UTC)

That isn't all very easy, is it? To recognise an “apostrophe” against a ‘single’ quotation mark? Maybe I should just create a report on it. r3m0t 15:30, Mar 4, 2005 (UTC)
297        &ldquo;
297     &rdquo;
288     &lsquo;
512     &rsquo;

That really isn't very many articles. r3m0t 16:05, Mar 4, 2005 (UTC)

Actually, there are far more with the numbered entities. See Wikipedia talk:Bots#Grammarbot and please continue discussion there. r3m0t 17:22, Mar 4, 2005 (UTC)

[edit] "e.g.", "i.e."

I've fixed these two a lot. the Chicago Manual of Style (14th edition) item 5.62 says "A comma is usually used after such expressions as that is, namely, i.e., and e.g." The reason being, these are almost exclusively used parenthetically; the comma is needed to indicate that.

If my memory of regular (Perl) expressions serves me, I think something like this is in order:

s/(\W)(ie|i\.e|ie\.)/$1i\.e\./g; # Make sure they all have two periods.
s/(\W)i\.e\.^\,/$1i\.e\.\,/g; # Make sure they all have the trailing comma.
Where possible, I would recommend that Latinese be replaced with more accessible English phrases like "for example" and "in other words". But maybe that's just me. Deco 03:12, 29 November 2005 (UTC)

[edit] Idea for Grammarbot

Here's an idea... Change any external link that links to an internal link (article). Some people may know how to make an external link, but they might make all their links as external links... especially to some that are articles. -- AllyUnion (talk) 20:50, 4 Jun 2005 (UTC)

Maybe. I'll look into it. r3m0t talk 10:11, Jun 5, 2005 (UTC)

[edit] don't delete, move

sometimes you shouldn't delete the space, but move it after comma, such as here.

[edit] don't delete, move

sometimes you shouldn't delete the space, but move it after comma, such as here.

[edit] I agree

[edit] I agree

The example given above is a good suggestion. hydnjo talk 8 July 2005 21:04 (UTC)


[edit] Pune

Image:WikiThanks.png Thank you for your contribution at Pune.
Please keep it up!!! - P R A D E E P Somani (talk)
Feel free to send me e-mail.

[edit] This is a automated to all bot operators

Please take a few moments and fill in the data for your bot on Wikipedia:Bots/Status Thank you Betacommand (talkcontribsBot) 19:06, 12 February 2007 (UTC)

[edit] Automated message to bot owners

As a result of discussion on the village pump and mailing list, bots are now allowed to edit up to 15 times per minute. The following is the new text regarding bot edit rates from Wikipedia:Bot Policy:

Until new bots are accepted they should wait 30-60 seconds between edits, so as to not clog the recent changes list and user watchlists. After being accepted and a bureaucrat has marked them as a bot, they can edit at a much faster pace. Bots doing non-urgent tasks should edit approximately once every ten seconds, while bots who would benefit from faster editing may edit approximately once every every four seconds.

Also, to eliminate the need to spam the bot talk pages, please add Wikipedia:Bot owners' noticeboard to your watchlist. Future messages which affect bot owners will be posted there. Thank you. --Mets501 02:55, 22 February 2007 (UTC)