Wikipedia talk:AutoWikiBrowser/Typos/Archive 1
From Wikipedia, the free encyclopedia
[edit] Womens'
I'm trying to correct many instances of "womens'" or "womens" to "women's", but I'm having trouble grabbing that trailing apostrophe in the regex. Can someone help me with the syntax? I'm wondering if this is a AWB bug or you have to do something special for apostrophes. --Thiseye 00:19, 13 January 2007 (UTC)
- It seems there is some sort of problem related to identifying the end of a word. However, using a whitespace instead of a wordbreak seems to work.
-
- "\b(W|w)omens'(\s)" -->> "$1omen's$2"
- Gaius Cornelius 13:08, 13 January 2007 (UTC)
-
- Thanks, that did the trick. :) --Thiseye 01:59, 14 January 2007 (UTC)
[edit] Greece
There's an error in the "Greece" entry. It should have $1, not $2. --Thiseye 01:43, 3 January 2007 (UTC)
[edit] Gandhi
There are two entries for Gandhi. I believe the newest one was added to avoid some false positives, but the old one wasn't removed. --Thiseye 01:26, 2 January 2007 (UTC)
[edit] Poss reconsider
Bizarre as in Some Bizarre Records. Rich Farmbrough 22:53 11 August 2006 (GMT).
[edit] Attempt
If a fix for attemp is desired, "\b(A|a)tt?em(p|t)(|ed|ing|s)\b" --> "$1ttempt$3" seems to work for all cases. I don't think it matches any real words.—Mrkwcz 17:23, 12 August 2006 (UTC)
[edit] Opposites
What about alternative beginnings to words, as in opposites like accessible and inaccessible? Instead of having two separate entries to check and maintain, we could easily just have one:
- <Typo word="Accessible/Inaccessible" find="\b(A|a|Ina|ina)ccessab(le|ility)\b" replace="$1ccessib$2" />
This would simply require a rule that opposites (starting with in-, un-, etc) should not be placed alphabetically, but placed with their root word, and in many cases in the same regex.
An other strategy would be a rule that any word covered like this outside its normal alphabetical order should have a comment line placed in the alphabetical list where it would have gone.
Euchiasmus 12:55, 20 August 2006 (UTC)
- Sounds like a great idea, reducing duplication is always good. thanks Martin 18:58, 20 August 2006 (UTC)
[edit] Victuals and eke
I removed these new additions:
Typo word="Victuals" find="\b(V|v)ittles\b" replace="$1ictuals"
Typo word="Eke" find="\b(E|e)e(ke|ked|kes|king)\b" replace="$1$2"
Typo word="Eke" find="\b(E|e)e(k)\b" replace="$1ke"
Typo word="Ekes" find="\b(E|e)eks\b" replace="$1kes"
"Vittles" is so old a misspelling that it's kind of its own word now, not to mention the cat food Tender Vittles, etc. (see the Google search).
"Eek" is a really common onomatopoeia for screaming, among other things. There are a lot of false positives on this Google search, and the words on the list should have 0 false positives. "Eeks" seems the same (lots of legit uses), there seem to be two legit uses of "eeke", and there are only 4 mainspace results for "eeked", 2 for "eeking", and none for "eekes". --Galaxiaad 18:34, 25 August 2006 (UTC)
- Ah, sorry. I even happen to know that Eek is a town in Alaska (you can't get there from here, or even from there—Google Maps fails!). But has "eek"-the-onomatopoeia been verbed? "The scream queen eeked out a living"? BTW, I've just made probably a few hundred changes to the list—I gather that you're genuinely interested, so you might want to take a gander.--BillFlis 23:36, 25 August 2006 (UTC)
-
- Yeah, all the changes are impressive and a bit overwhelming. I definitely want to look though. I didn't mean to sound harsh in my previous comment; sometimes it's hard for me to sound human instead of just stating facts, heh. Hm, doesn't look like it's been verbed, but there is the plural in "Eeks and Squeaks". (The instances of "eeked" actually were typos for "eked" but there were only 4, which isn't enough to merit inclusion.) Hey, I'm just wondering and you'd probably know: what does the word="whatever" bit actually do? --Galaxiaad 13:58, 26 August 2006 (UTC)
-
-
- Actually, I thought your points were well-taken. I figure the word="whatever is just informational. My understanding of the AWB is that it's not a bot, it just helps someone make the same kind of edit over and over very quickly. I have a question about how it uses this typo list: I've noticed that some of the rules here have sort of the opposite of a false positive; that is, the correct spelling will trigger a change, back to the correct spelling. There's no harm done, but isn't this inefficient? Should I be stamping these cases out?--BillFlis 14:27, 26 August 2006 (UTC)
-
-
-
-
- The word property means they can be sorted in proper alphabetic order (sorting by order of the typo was very difficult to deal with, as duplicates were not adjacent to eachother), also it allows easy location of a specific word, which will hopefully avoid future duplication, and probably explains the enourmous amount of duplication that previously existed. Matching the correct spelling is much more efficient than having 2 separate regexes (which is how is used to be) but not as efficent as having a single regex that manages to avoid the correct spelling, so yes, avoid them when possible, but if it is becoming complicated then it doesnt really matter. And thanks for all the work you have done on this! Martin 14:48, 26 August 2006 (UTC)
-
-
[edit] Airbourne
I got a false positive running AWB when Airbourne was changed to Airborne. Should this be removed? — Loudsox 16:46, 27 August 2006 (UTC)
- I think it should be removed, or maybe changed. I think a more likely misspelling is airborn. What would be really nice is some way to tag a word within the encyclopedia as a deliberate misspelling, like adding "[sic]".--BillFlis 17:34, 27 August 2006 (UTC)
[edit] Regexes that match the correct spelling
Sometimes a regex, in providing matches for a variety of possible misspellings, matches the correct spelling. As best I can tell, AWB stops on an article when the regex matches the correct spelling and therefore makes no change.
Example: for "Apparel", the regex
(A|a)pp?arr?e(l|ls|ling|lling|led|lled)
corrects "Aparrel", "Aparel", and "Apparrel". Unfortunately, those alternatives allow "Apparel" to match, so AWB stops on "Apparel" but shows no diff. Example article: Jones Apparel Group.
So, 1) is what I'm saying true; 2) is there a preference against such regexes; 3) is there a way to fix the regex (while keeping only one regex) to avoid this? (And/or, can AWB be programmed to realize that a null edit has occurred?) Thanks, –Outriggr § 01:23, 16 September 2006 (UTC)
- Well, I just played with the "Skip article when no change made" setting (which I could swear was on by default, or that I have always had it on), and I see that AWB no longer stops in the above case. Not such an issue then? –Outriggr § 01:31, 16 September 2006 (UTC)
-
- I've been told that the regex does make the "change" (to the same correct spelling, thus useless work) and is thus wasteful of resources. Have I been given some bad info? I've been trying to stamp out such cases, but maybe the program is smart enough to recognize (i.e., it checks) whether any real change is made, and I'm the one doing the useless work!--BillFlis 20:15, 16 September 2006 (UTC)
-
-
- The program is smart enough to know if a change was actually made, but it is slightly preferable not to match the correct spelling, though not critical. I suppose it might be more critical in the future if some other software wanted to make use of this list though. Martin 09:31, 17 September 2006 (UTC)
-
[edit] Suggestion of a change
How about "alot" to "a lot". But I am not sure how to program it.--Esprit15d 17:50, 27 September 2006 (UTC)
- But it might be "allot".--BillFlis 19:46, 27 September 2006 (UTC)
I suppose:
<Typo word="Alot" find="\b(A|a)lot\b" replace="$1 lot" /> <Typo word="Allot" find="\b(A|a)llot\b" replace="$1 lot" />
Reedy Boy 17:10, 16 October 2006 (UTC)
Upon doing it manually with AWB find and replace the words allotment and ballots came up causing a problem with the search on Allot.
Would running those like that, ensure that only that word is used? Or would it include words that include alot/allot?
Reedy Boy 17:11, 16 October 2006 (UTC)
Seems some people use allot instead of allocate...?
Reedy Boy 17:14, 16 October 2006 (UTC)
reject Allot comes from the sense of "assigning by lot" and therefore implies random allocation. Allotment has a specific political meaning of "to select by random selection" - aka "jury" selection and "sortition". Allocation does not have any sense of chance and e.g. to allocate a person to a jury rather than allot them would imply they were chosen rather than selected at random (which would dramatically change their nature) The two words are very different and in my view to replace "allot" with "a lot" was just vandalism. --Mike 16:10, 18 October 2006 (UTC)
I think what you intended was:
<Typo word="Alot" find="\b(A|a)lot\b" replace="$1 lot" /> <Typo word="Allot" find="\b(A|a)lot\b" replace="$1llot" />
Then you'd have to run AWB manually (isn't this always how it's run?), and decide which rule to accept: alot --> a lot or alot --> allot. Yes, allot means allocate, as "within the allotted time". This would be safe to add, I think:
<Typo word="Allot_" find="\b(A|a)lot(ted|ting|ments?|tees?)\b" replace="$1llot$2" />
where we add the low-line character (_) to signal that only certain endings are being treated.--BillFlis 17:22, 16 October 2006 (UTC)
[edit] Reconsider
- "Amoung" was "amount" not "among"
- Klaus Flouride a musician (Note caps)
- "Mayonaise (song)" the music track. (Note caps)
- Place called "Casette" (Note caps)
Rich Farmbrough, 19:33 3 October 2006 (GMT).
-
- I'm a bit concerned that people—both those who use AWB, and those who see bad edits—forget that this system is semi-automated. In conjunction with the fact that the AWB user is reviewing his edits, I don't see why it is necessary to get rid of a spelling correction rule even if there are very rare exceptions to that rule. I managed not to "correct" Garry Tallent (in another article) once. I'm not pressing for the removal of the spelling error "tallent". –Outriggr § 00:27, 4 October 2006 (UTC)
-
-
- Simply because the stated aim is to have no false positives. "The lofty goal of RETF is to be completely automatic." It is a courtesy to the creator report problems here. Rich Farmbrough, 21:58 7 October 2006 (GMT).
-
[edit] Two questions
- Is "first-hand" really bad? dictionary.com
- Comunal->Communal breaks Estadio Comunal de Aixovall, do we care?
Rich Farmbrough, 21:58 7 October 2006 (GMT).
- Also, "first hand" can occur together. "I won the first hand."--BillFlis 12:01, 8 October 2006 (UTC)
-
- Actually, "first hand" occurs in Canasta.
- Each player is dealt a hand of 11 and a second hand of 13, sometimes referred to as the "hand" and the "foot", respectively. The hand with the lowest bottom card is played first. Once a player plays all cards from his first hand he picks up the second and continues normal play.
- It has caused a false positive.Punainen Nörtti 18:15, 25 October 2006 (UTC)
- Actually, "first hand" occurs in Canasta.
[edit] Countries
I've added entries to convert names of countries to Title Case. My process was:
- copy list of countries from List of countries
- process to remove text in () or []
- process "See * for *" lines
- change lines with "1, 2" into "2 1" (eg "Congo, Republic of")
- manually inspect and make special changes (eg Taiwan)
- add to AutoWikiBrowser/Typos and test
- remove duplicates that had already been put onto the list
- remove country names that are also words that can be in lowercase (chad, guinea, jersey)
I guess that many of the lines could be manually tweaked to give greater coverage of variants - but this is a start, anyway...
Hope this doesn't generate too many erroneous matches that I haven't thought of...
Euchiasmus 07:40, 8 October 2006 (UTC)
- "wale(s)" and "coco(s)" have uncapitalized meanings in http://www.m-w.com. "chile" is a valid spelling of "chili" (capsicum). "india" (occasionally before "ink" and "rubber") isn't always capitalized.--BillFlis 11:54, 8 October 2006 (UTC)
Thanks, Bill - I've removed those. I also realised about turkey and took that out too. Euchiasmus 19:51, 9 October 2006 (UTC)
- Because this is an issue of capitalisation rather than spelling, I suggest that these entries are placed in a separate section rather than being distributed into the A, B, C, sections. Gaius Cornelius 13:21, 6 November 2006 (UTC)
[edit] Predominately?
Suggested addition - replacing "predominately" (not a word) with "predominantly." | Mr. Darcy talk 20:22, 6 November 2006 (UTC)
- Sorry, but "predominately" is indeed a word, meaning--guess what?--"predominantly". See here.--BillFlis 19:58, 10 November 2006 (UTC)
[edit] 'Logical' punctuation in quotations
I'm changing punctuation at the end of quotations to 'logical' style, per Wikipedia:Manual of Style#Quotations by replacing <," > (comma-quote-space) with <", > (quote-comma-space) throughout (e.g. <"Yes," he said.> to <"Yes", he said.>. I haven't come across any false positives yet. A similar replacement might be possible for embedded full stops at the end of quotations, but that's more controversial and would produce too many false positives, I think, unless someone could suggest a clever method to exclude the case where an entire sentence, including its final punctuation, is being quoted. Colonies Chris 22:59, 6 November 2006 (UTC)
[edit] Orignal --> Original
There is a town in Ontario called L'Orignal, mentioned in a few articles, so the regex should exclude this if possible. Colonies Chris 08:23, 9 November 2006 (UTC)
[edit] Problem with "definitions"
When presented with the misspelling "defintions" it tries to replace it with "definitons" which is still not the correct spelling. I took a look at the RegEx and I am not quite sure how to fix this problem, so if somebody with more experience can fix it, that would be great. --Maelnuneb (Talk) 19:49, 10 November 2006 (UTC)
- OK, fixed, thanks.--BillFlis 19:58, 10 November 2006 (UTC)
[edit] Firsthand
I am getting a ton of false-positives with this one. Card game pages are a real big source of false-positives. I am going to remove it from the list due to this. Code for the RegEx was: <Typo word="Firsthand" find="\b(F|f)irst[ -]hand\b" replace="$1irsthand" /> Possible fix: only match first-hand, but I'm not positive that version isn't an acceptable spelling. Any comment on that would be great. --Maelnuneb (Talk) 20:59, 13 November 2006 (UTC)
- After looking up first-hand on [1], it suggested firsthand, so I will add checking for "first-hand" back into the system, but not "first hand" as the possibility of a false positive for "first-hand" is non-existent. If people believe that "first hand" should be included still, please debate here. --Maelnuneb (Talk) 21:05, 13 November 2006 (UTC)
-
-
- Given that, I would agree to not have firsthand in the list of typos. I personally didn't write the rule in the first place, just tweaked it to get rid of false positives and then did a quick search to see if "first-hand" was a correct spelling, running on the assumption that the original contributor that added the rule for firsthand was in fact correct. Centrx, thank you very much for finding evidence of the other spellings and bringing them here. --Maelnuneb (Talk) 17:46, 15 November 2006 (UTC)
-
Also, this list really does need to be restricted to typos, not bad usage, because quotations and normal sentences will be filled with cases that should not be "corrected". Also, with compound words there are common sentences (such as actually referring to the first hand of something, as in a game of cards or something about physiology) that would never warrant changing. —Centrx→talk • 06:34, 16 November 2006 (UTC)
- Typos would still show up in those cases unfortunately. That is the entire reason that the process of fixing typos is not automated. Your point about "first hand" was exactly why I changed the rule to match only "first-hand" actually. I was getting tired of fixing false positives, so I changed the rule to prevent it. --Maelnuneb (Talk) 18:00, 17 November 2006 (UTC)
[edit] referrences -> referencces
<Typo word="Reference" find="\b(R|r)efe(?:rr?a|rre)n(ce[ds]?|cing|ts?)\b" replace="$1eferenc$2" />
should likely be
<Typo word="Reference" find="\b(R|r)efe(?:rr?a|rre)n(ce[ds]?|cing|ts?)\b" replace="$1eferen$2" />
~ BigrTex 20:19, 15 November 2006 (UTC)
Thank you for your suggestion! When you feel an article needs improvement, please feel free to make those changes. Wikipedia is a wiki, so anyone can edit almost any article by simply following the Edit this page link at the top. You don't even need to log in (although there are many reasons why you might want to). The Wikipedia community encourages you to be bold in updating pages. Don't worry too much about making honest mistakes — they're likely to be found and corrected quickly. If you're not sure how editing works, check out how to edit a page, or use the sandbox to try out your editing skills. New contributors are always welcome. ~ BigrTex 20:00, 16 November 2006 (UTC)
[edit] Society, abundant
- Societ -> Society
- abundandt - >abundant
- abundandtly -> abundantly
I stumbled across "Societ" today, and I have a tendency to add an an unnecessary d to abundant as well, but I don't know how to add these to the filters myself. --Lethargy 00:14, 16 November 2006 (UTC)
- I have just added <Typo word="Abundant" find="\b(A|a)bundand(t|tly)\b" replace="$1bundan$2" /> Tankred 00:38, 16 November 2006 (UTC)
[edit] <Typo word="Oft(en)times" find="\b(O|o)ft(|en)[- ]times\b" replace="$1ft$2times" /
Often Times to Oftentimes ???
It might be me, but that seems like a use that would be sparsely used?
Or is it just me?
Reedy Boy 15:32, 19 November 2006 (UTC)
[edit] New additions section
Can we be more explicit in whether the new additions should be put at the beginning or at the end of the "New additions" section? People put them to both places, which makes the chronology of the section a bit problematic to follow. The section is fairly large now and it would be perhaps a good idea to check the oldest additions again and then to put them to the main body. Tankred 16:55, 19 November 2006 (UTC)
[edit] Increase
Suggested addition: While fixing other typos I stumbled upon 'increse' (missing a).
<Typo word="Increase" find="\b(I|i)ncres(e|ed|ing|ingly)\b" replace="$1ncreas$2" />
Thanks. ChrisCork 06:51, 28 November 2006 (UTC)
- Added, with the handling of "Decrease" as well.--BillFlis 12:52, 28 November 2006 (UTC)
[edit] Super Bowl
Superbowl -> Super Bowl. I see that one a lot, not just on the Wiki. I'm not sure how to add listings that split into two words, so I'm adding it here. --cholmes75 (chit chat) 20:56, 28 November 2006 (UTC)
- Done!--BillFlis 21:02, 28 November 2006 (UTC)
[edit] Guerilla
<Typo word="Guerilla" find="\b(G|g)uer(?:r?i|ril?)l(as?)\b" replace="$1uerill$2" />
We are replacing Guerrilla with Guerilla, even though the article spells it the 'wrong' way. I have removed the line. ~ BigrTex 00:12, 1 December 2006 (UTC)
[edit] Problem with kW, kJ, Hz
I'm getting problems with kW, kJ, Hz because AWB now changes (eg on the Bible page)
- [[kw:Bibel]] to [[kW:Bibel]]
- [[kj:Ombibeli]] to [[kJ:Ombibeli]]
- [[hz:Ombeibela]] to [[Hz:Ombeibela]]
They then get moved out of sequence. I suggest the regex be amended to exclude situations where the word is preceded by square brackets and followed by a colon.
Sorry haven't got time to do it at present - I'm rushing off to work!
Cheers - Euchiasmus 07:08, 1 December 2006 (UTC)
[edit] Rule Problems
- The rule as written changes governement to governmen. -- Saaber 04:07, 4 December 2006 (UTC)
- The rule as written changes quanity to quantituanit. -- Saaber 11:02, 4 December 2006 (UTC)
- The rule as written changes 'dominican' to 'Dominica' -- ChrisCork 15:48, 15 December 2006 (UTC)
[edit] Miniscule
... is cool, listed as a variant of "minuscule" here and here.--BillFlis 12:50, 9 December 2006 (UTC)
- The misspelling has become so widespread that some authorities are listing it as an alternative. However, there is still a clear majority in favour of the correct spelling. I vote we go with the majority and stick to minuscule. Euchiasmus 16:07, 9 December 2006 (UTC)
-
- Dictionary.com shows "miniscule" in three different sources here, which makes a total of at least four, since M-W isn't one of them. Given the policy against changing from one spelling of the same word to another, I don't think we should be automatically changing this. —Krellis 17:31, 11 December 2006 (UTC)
-
-
- Whatever you do, don't change the occurrences of "miniscule" in the minuscule article. This article does indeed say that "miniscule" has been "traditionally regarded as a spelling mistake," although no reference is offered for this contention. Some discussion with references may be found here.--BillFlis 19:03, 11 December 2006 (UTC)
-
[edit] Changing ordinals to cardinals in dates
Please can we remove the ordinal to cardinal conversion in dates? Maybe the Americans don't habitually use dates like "1st May", but we British do use them and I can't see anything wrong with them. When I read "1 May" it looks very strange, especially in narrative prose.
Here in UK the use of st|nd|rd|th is very common in dates. For example, glancing through filed correspondance I find that the majority of my documents (insurance policies, bank statements, nominet registration, etc) use ordinal numbers in dates. With other regional variations WP allows alternative forms - why not in dates?
Euchiasmus 14:18, 10 December 2006 (UTC)
- I personally have mixed feelings about adding things to the typo list that aren't typos or misspellings, but the intention here was clearly to go with the Manual of Style guideline on ordinal suffixes in dates (relevant section here). So you'd really probably be better off bringing it up there. Hope this helps. --Galaxiaad 19:02, 10 December 2006 (UTC)
- Here are a couple of points:
- because WP:DATE is a guideline, consensus was reached about the date format to be used. While a guideline is not a rule, we should be striving towards the suggestions given unless there is a strong push for a change, which would mean that there is no longer consensus. Therefore, while consensus still exists, there is no reason to remove the rules removing ordinals from dates.
- A note to users of WP:AWB/T: be careful not to remove ordinals in direct quotes. --Maelnuneb (Talk) 17:44, 12 December 2006 (UTC)
[edit] Error in proclaim rule?
The current rule for proclaim:
word="Proclaim" find="\b(P|p)roclam(e[dsr]?|ing)\b" replace="$1roclaim$2"
changes proclame to proclaime. Was this intended? Euchiasmus 11:36, 17 December 2006 (UTC)
- I think not. The "?" shouldn't be there.--BillFlis 13:36, 17 December 2006 (UTC)
[edit] 'Receive' typo
I see there've been some recent changes to the way 'receive' is corrected, but unfortunatly it's now broken. I'm not too hot on regexp, so could someone take a look for me please? ^_^ ShakingSpirittalk 07:18, 19 December 2006 (UTC)
[edit] New words
I'm looking at Wikipedia:Lists of common misspellings and am trying to fix some of them, using AWB. As thus, I'd like someone more skilled with regexes than me to add:
- Sacrifice
- Satellite
- Sandwich
- Sergeant
Come to think about it, someone with enough time on their hands could just go ahead and look through everything in Wikipedia:Lists of common misspellings. Obviously, I was looking at S, but there's probably a lot missing elsewhere too. Thank you! Jobjörn (Talk ° contribs) 02:00, 25 December 2006 (UTC)
- Jobjörn: I am currently working through all the 'S' typos myself. I am about halfway through a dump of the 30-Nov-2006 database. It might make more sense for us not to duplicate this effort - would you mind working on another letter? There are plenty to go round. If you wish I can help you with a whole bunch of regexes. Personally, I like to work on a set of regexes to make sure that there not too many errors or false positives before submitting them to the Wikipedia:AutoWikiBrowser/Typos list. Still, I have added sacrifice, sandwich and satellite for you - but not sergent because it generates false positives against a common surname. You might like to try this regex for lowercase only:
-
- "sargant(s?)" --> "sergeant$1"
- Let me know what you think - but it is Christmas and I will be away for a few days! Gaius Cornelius 13:25, 25 December 2006 (UTC)
-
- No, definitely. I'll grab some other letter. Jobjörn (Talk ° contribs) 17:06, 25 December 2006 (UTC)
[edit] Targetting/targeting
I don't have the right dictionaries handy to confirm, but AFAIK 'targetting' and 'targetted' are accepted spellings in UK English (and possibly Australian English as well). Could somebody with access to the OED and/or Macquarie please check this and remove them from the list if this is so? --Calair 05:13, 30 December 2006 (UTC)
[edit] Typicaly & Essentialy
If someone could add 'typicaly' (typically) & 'essentialy' (essentially) to the regex list that would be great, there seem to be a lot of these errors at the moment.--Hooperbloob 07:31, 4 January 2007 (UTC)
- Done. Gaius Cornelius 21:11, 4 January 2007 (UTC)
[edit] Manoeuver
I just merged (Out)Manoeuver into Maneuver as (Out)Maneuver. This is the line I deleted:
<Typo word="(Out)Manoeuver" find="\b([Oo]utm|M|m)an(?:[oeu]{1,2})ver(s?|ing|e[dr]|abl[ey]|ability)\b" replace="$1anoeuver$2" />
If someone could double-check my merge, I'd appreciate it. ~ BigrTex 21:23, 5 January 2007 (UTC)
- AFAIK, the British spelling is
'manouevre''manoeuvre', so it's probably not a good idea to auto-correct a spelling halfway between the two to the US option without checking context. --Calair 23:19, 5 January 2007 (UTC)- My big American dictionary here has "manoeuvre" and "manoeuver" (but not "manouevre"--are you sure that's right?) as variants of "maneuver", without any indication that they are only British spellings. However, this dictionary says "manoeuvre" is "Chiefly British"; no listing for "manoeuver".--BillFlis 00:05, 6 January 2007 (UTC)
- Oops, typo fixed, thanks :-)
- I don't have good references handy, but as per American_and_British_English_spelling_differences#-re_.2F_-er the usual UK spelling is 'manoeuvre' and the US spelling is 'maneuver'. (This comes from a combination of US/UK differences on whether to end words with '-re' or '-er', combined with different rules on rendering the ligature 'œ' in a modern alphabet - UK spellings tend to split it into two letters, US spellings go with a single phonetic 'e'.)
- 'Manoeuver' is halfway between the two; it probably should be corrected where it appears, but I'd recommend checking context (i.e. the subject matter of the article, and failing that the style of the rest of it) to judge which way the correction should go. --Calair 01:31, 6 January 2007 (UTC)
- My big American dictionary here has "manoeuvre" and "manoeuver" (but not "manouevre"--are you sure that's right?) as variants of "maneuver", without any indication that they are only British spellings. However, this dictionary says "manoeuvre" is "Chiefly British"; no listing for "manoeuver".--BillFlis 00:05, 6 January 2007 (UTC)
[edit] Prepubescent or pre-pubescent
I'm not sure which is the correct format but both exist in quantity here.--Hooperbloob 03:02, 6 January 2007 (UTC)
[edit] Comital
While it's a common misspelling for "committal", "comital" is also a legitimate word meaning "pertaining to the count". I don't know enough about regexps to fix this, but perhaps something should be done; I've seen this change made twice in the past month or so. Choess 15:56, 12 January 2007 (UTC)
[edit] Sponser
Over 300 of these last time I checked. Should be 'sponsor', 'co-sponsor', 'sponsored', 'sponsoring', etc. --Hooperbloob 08:01, 14 January 2007 (UTC)
- Only just under 100 in mainspace, according to wikisearch, I'll take a stab at them and report back. —Krellis 23:13, 15 January 2007 (UTC)
- All of these in mainspace and Images: should now be taken care of. —Krellis 00:05, 16 January 2007 (UTC)
[edit] Trailor
trailor -> trailer --Hooperbloob 08:25, 14 January 2007 (UTC)
[edit] "_Strange" Pattern
I just removed the following pattern:
- <Typo word="_Strange" find="(?<!\b([A-Z][a-z]*))(\s[Ss])tange\b" replace="$1trange" />
For two reasons:
- "Stange" is a last name that I've run across a number of times, particularly in Major League Baseball articles.
- The pattern is broken, replacing "Stange" with "trange" - the negative lookbehind assertion appears to be capturing, so the $1 would need to be $2.
Replacing "stange" to "strange" is probably fine, as long as we don't replace the capitalized version. I don't quite understand why this pattern has the lookbehind stuff, rather than just using word boundaries like other patterns, so I don't feel comfortable replacing it - if the original author (or anyone else) wants to do so, please go ahead, as long as you preserve "Stange" and make sure it replaces the right captured string. —Krellis 20:53, 15 January 2007 (UTC)
- I originally added this fix. The purpose of the lookbehind was to elliminate instances of Stange preceeded by a word that begins with a capital letter - which may be a first name. I found this pretty effective at reducing false positives. Gaius Cornelius 21:09, 15 January 2007 (UTC)
-
- Aha, okay, that makes so much more sense now. My brain just wasn't in a regex parsing mood earlier, I guess. Unfortunately, I've come across at least four or five false positives in the past few days - many articles use just the last name to identify individuals once they have been introduced. At least some of the FPs I've seen have been at the beginning of a sentence or line, so matching that in a lookbehind could theoretically help avoid some more, though of course it would probably prevent legitimate errors from being found as well. Given the advice of "don't add if there is one (false positive)" at the top of the list, I would suggest "Stange" be considered a lost cause, and just the lower case version be re-added. —Krellis 23:01, 15 January 2007 (UTC)
[edit] ((In)De/In/Af)Finite misbehaves!
The list of typos includes the almost impossibly complicated:
<Typo word="((In)De/In/Af)Finite" find="\b([Ii]n|)(F|f|[Dd]ef|[Aa]ff)(?:finite?|f?in[ae]te?|f?init)(s?|ly|ness|y)\b" replace="$1$2init$3" />
It changes infinetly to infinitly - (for example, try it with the Home Construction article).
If I could work out what it was doing right and what it was doing wrong, I would correct it! I think my example is not the only thing it does wrong. Somebody please help! Thanks. - Euchiasmus 20:17, 25 January 2007 (UTC)
[edit] Light Year
I ended up having a problem with the light year regex, so I removed it. Here is the original code: <Typo word="_Light year" find="(?<!\b(Buzz ))(L|l)ig?h?tyea(rs?)\b" replace="$1ight yea$2" />
This is what was happening when it ran for me: AWB found "lightyears" and wanted to replace it with "ight yeal". Obviously a problem with the substitution. I tried changing the $1→$2 and the $2→$3, but that did not end up working for me, which does not make any sense to me. If somebody with more experience can attempt to fix this one, that would be great. --Maelnuneb (Talk) 20:26, 26 January 2007 (UTC)
- My fix was actually correct. I just had a cache problem getting in the way of having an updated set of typo rules. Problem solved. --Maelnuneb (Talk) 20:31, 26 January 2007 (UTC)
- This dictionary says that it's "light-year", with a hyphen.--BillFlis 13:22, 27 January 2007 (UTC)
- My home dictionary gives it as two words whereas the wikipedia article says it is either one word or hyphenated. I guess that typo fix had better come out. Gaius Cornelius 19:04, 27 January 2007 (UTC)
[edit] Peleton
peleton -> peloton
Thanks, Mk3severo 00:55, 2 February 2007 (UTC)
[edit] ususally --> usually
Please add this typo to the list. Harryboyles 05:03, 2 February 2007 (UTC)
- Added. Wow, a quick search turned up 380 instances of this weird misspelling!--BillFlis 13:02, 2 February 2007 (UTC)
[edit] Simalar -> similar
not really sure how to add that... -ΖαππερΝαππερ BabelAlexandria 14:18, 13 February 2007 (UTC)
-
<Typo word="Similar" find="\b(S|s)imalar\b" replace="$1imilar" />
- Reedy Boy 14:43, 13 February 2007 (UTC)
- Just Looked, there is
<Typo word="(Dis)Similar" find="\b(S|s|[Dd]iss)im(?:mi|u)lar(|ly|ity)\b" replace="$1imilar$2" />
So, possibly encorporate with that?
<Typo word="(Dis)Similar" find="\b(S|s|[Dd]iss)im(?:mi|u|a)lar(|ly|ity)\b" replace="$1imilar$2" />
I think. Addition of |a to the middle of the word Reedy Boy 14:46, 13 February 2007 (UTC)
[edit] Moniter -> Monitor
Need to handle moniter, monitering, monitered, etc..--Hooperbloob 23:48, 28 February 2007 (UTC)
[edit] Misspellings to be added
Should new misspellings go here or in the "Misspellings to be Added" section of the main project page? Regardless, here's about 90 that I've amassed. I'd add them myself, but some of those regexes are pretty complex and scare me. I've verified that all these aren't acceptable by dictionary.com and that there are at least 10 instances of each in Wikipedia. False positives haven't been checked for, however. And there are probably prefixes/suffixes that can be added to most of them.
(Can someone please add some of these? --Thiseye 07:02, 2 March 2007 (UTC))
- jeapordy → jeopardy
- likley → likely
- liqour → liquor
- literaly → literally
- minsitry → ministry
- mountian → mountain
- newstands → newsstands
- nobilty → nobility
- oppenent → opponent
- orginial → original
- personna → persona
- editted → edited
- posibility → possibility
- precip(a|ia)tion → precipitation
- prepatory → preparatory
- pricipal → principal
- recruting → recruiting
- reliquish → relinquish
- reminicent → reminiscent
- replacment → replacement
- responed → responded
- sectretary → secretary
- signiture → signature
- similarily → similarly
- similiar → similar
- unsheath → unsheathe
- valiently → valiantly
- wherupon → whereupon
- wheter → whether
- protray → portray
- protrayed → portrayed
[edit] Questioned
- widly → widely
- Might be a typo for "wildly" instead of "widely" -- JHunterJ 11:27, 13 April 2007 (UTC)
- intitution → institution
- Might be a typo for "intuition" instead of "institution" -- JHunterJ 16:39, 22 June 2007 (UTC)
- summery -> summary --John 23:32, 23 July 2007 (UTC)
[edit] Reliable sources
Is dictionary.com a reliable source?--Andeh 06:04, 11 August 2006 (UTC)
- Nope. See here. alphaChimp laudare 06:19, 11 August 2006 (UTC)
- OK, what about Microsoft Word 2000's or higher dictionary?--Andeh 06:25, 11 August 2006 (UTC)
This looks like a good source for misspellings: http://www.misspelled.com/common/a.htm --BillFlis 10:45, 27 August 2006 (UTC)
[edit] Full stops, commas, colons, brackets and double spaces
I have felt that following mistakes are too comon (specially in stubs) to ignore:
- c denotes any alphanumeric character
- s denotes a space character
Mistake | Correction | Suggested code |
---|---|---|
c.c | c.sc |
<Typofind="\b(a-zA-Z).(a-zA-Z)\b" replace="$1. $2" /> |
cs.c | c.sc |
<Typofind="\b(a-zA-Z) .(a-zA-Z)\b" replace="$1. $2" /> |
cs.sc | c.sc |
<Typofind="\b(a-zA-Z) . (a-zA-Z)\b" replace="$1. $2" /> |
c,c | c,sc |
<Typofind="\b(a-zA-Z),(a-zA-Z)\b" replace="$1, $2" /> |
cs,c | c,sc |
<Typofind="\b(a-zA-Z) ,(a-zA-Z)\b" replace="$1, $2" /> |
cs,sc | c,sc |
<Typofind="\b(a-zA-Z) , (a-zA-Z)\b" replace="$1, $2" /> |
c;c | c;sc |
<Typofind="\b(a-zA-Z);(a-zA-Z)\b" replace="$1; $2" /> |
cs;c | c;sc |
<Typofind="\b(a-zA-Z) ;(a-zA-Z)\b" replace="$1; $2" /> |
cs;sc | c;sc |
<Typofind="\b(a-zA-Z) ; (a-zA-Z)\b" replace="$1; $2" /> |
c(c | cs(c | And so forth |
c(sc | cs(c | And so forth |
cs(sc | cs(c | And so forth |
c)c | c)sc | And so forth |
cs)c | c)sc | And so forth |
cs)sc | c)sc | And so forth |
ss | s | And so forth |
Note: Suggested code is based on my preliminary understanding of the pattern of the working code at Wikipedia:AutoWikiBrowser/Typos, and I am very sure it is wrong and needs to be corrected.
Szhaider 15:39, 9 October 2006 (UTC)
- These are indeed common mistakes, but unfortunately, in my experience there are too many legitimate exceptions, such as ".NET", the other mistakes may not have so many exceptions though. Martin 16:16, 9 October 2006 (UTC)
-
- Yeah, and what about U.S.A.? Or T.S. Eliot? Also, semi-colon is part of many HTML entities, like "—" etc., which will butt right up against letters.--BillFlis 02:11, 10 October 2006 (UTC)
[edit] facilitate
The new entry for facilitate is not correct. It's changing facilitate to facilitatli. I think it should have $3 instead of $2. --Thiseye 00:44, 1 March 2007 (UTC)
- Thanks for reporting; fixed. -- intgr 00:47, 1 March 2007 (UTC)
[edit] secretarty -> secretary
found in Marita Ulvskog. Jobjörn (Talk ° contribs) 01:21, 8 March 2007 (UTC)
- Added to existing "Secretary" entry.--BillFlis 22:33, 8 March 2007 (UTC)
[edit] RETF oddities
I noticed something strange that could be a bug in AWB. I've noticed in several articles that if a typo is in wiki tags [[]], then RETF will not catch this. I assumed this was because it's not excluding the brackets as part of the word so it wasn't matching the regex. But then I noticed in the Akshay Pratap Singh article, that the FAR does catch typos within wiki tags. In this article, "politican" is misspelled. I had a FAR entry to correct this which I recently added to RETF. However, I noticed when I disabled the FAR entry, it would no longer be corrected. I updated the FAR regex to exactly that of the RETF regex, and still FAR would correct it, but RETF would not. --Thiseye 22:43, 11 March 2007 (UTC)
- I believe this has been discussed a few times over on the AWB talk pages, it has been setup like this purposely. There are reasons for doing it both ways, and i think we are looking into having it check more... Post it on the AWB talk page... Reedy Boy 17:55, 12 March 2007 (UTC)
[edit] Not sure if anyone will see this...
I was wondering if the AWB could include the often misused words "reoccur", "reoccured", and "reoccuring". These are not actual words (contrary to popular assumption)! They should all be changed to "recur", "recurred", and "recurring". Mahalo. --Ali'i 20:44, 13 March 2007 (UTC)
- Oops, they already are included:
<Typo word="(Re(o)c/Re)currence" find="\b([Rr]eoc|[Oo]c|Re)curran(ces?|t|tly)\b" replace="$1curren$2" /> <Typo word="Recurr(ed/ing)" find="\b(R|r)ec(?:cur?|u)r(ed|ing|ent|ently)\b" replace="$1ecurr$2" />
- Sorry about that. Thanks anyway. --Ali'i 20:47, 13 March 2007 (UTC)
[edit] Includeing -> Including
As above, suggest replacing includeing with including. Harryboyles 05:59, 17 March 2007 (UTC)
[edit] Asian needs to be updated
There is a misspelling in Kai Chen as asain, the current accounts for aisian....
[edit] Dependant vs. Dependent
It appears that "dependant" is acceptable in British English, esp. as a noun. If people concur, it should be removed from the typo list IMHO. —Wknight94 (talk) 15:21, 23 March 2007 (UTC)
- It's not just British. An American dictionary http://www.m-w.com/dictionary/dependant lists it too.--BillFlis 18:10, 23 March 2007 (UTC)
- So it should be removed, no? —Wknight94 (talk) 14:05, 24 March 2007 (UTC)
- It definitely needs to be removed. As a noun a dependant is a person looked after by another e.g. a father's dependants are his children (sorry for the approximate definition). Dependant may well be incorrectly used e.g. 'dependant on the weather ...' but can't be fixed this way. Rjwilmsi 19:19, 26 March 2007 (UTC)
- So it should be removed, no? —Wknight94 (talk) 14:05, 24 March 2007 (UTC)
[edit] Regex/CPU question
I know that we want to reduce the number of regexes to reduce the amount of CPU time used to process them all. I'm assuming this means that there is little to no CPU cost associated with adding a variant to an existing regex compared to adding a completely new entry. Should we avoid adding variants to an existing regex that don't occur too often, or does that matter?
Also, it seems we avoid "catching" the correct spelling within the regex. Is that the standard we should go by? And to what extent should we go to avoid that situation? I've seen some regexes that do catch the correct spelling, so should I try to rework these, or is this sometimes acceptable ("available" is an example). Further, should we avoid trying to catch certain variants of typos to avoid catching the correct spelling? Should we avoid adding a new entry to try to catch a variant to avoid catching the correct spelling ("Vancouver" is an example)? --Thiseye 18:28, 25 March 2007 (UTC)
- I guess since I received no feedback regarding this, what do people think about creating a "sub-list" of typos that people could manually synch with in AWB to use. This typo list will contain regexes that catch a lot of typos, but occasionally catch false positives. In other words, a list where we're not necessarily trying to get "100% accuracy" like the current list, but still accurate more often than not. A user using this list would have to be more careful of the substitutions made. I keep such a separate list already, but it'd be nice to have a common one. The "New Jersey" post below is one such candidate. --Thiseye 01:29, 8 July 2007 (UTC)
[edit] Combining regexes that catch missing "e" before "ly" suffix
I wanted to get some other's thoughts on combining several regexes (and incorporating some new ones). The thing is that if we want to add other variants to these, we'd probably want to separate them out again.
<Typo word="(Accurate/Active/Affectionate/Alternate/Appropriate/(Ab/Re)solute/Collective/Consecutive/Desperate/Exclusive/Extensive/False/Large/Separate/Severe)ly" find="\b((A|a)(ccurat|ctiv|ffectionat|lternat|ppropriat)|([Aa]b|[Rr]e)solut|(C|c)o(llec|nsecu)tiv|(D|d)esperat|(E|e)x(clu|ten)siv|(F|f)als|(L|l)arg|(S|s)e(parat|ver))ly\b" replace="$1ely" />
--Thiseye 00:01, 26 March 2007 (UTC)
- I think this is a good idea, I have been using some regexes like this personally and they can work pretty well. Gaius Cornelius 00:05, 26 March 2007 (UTC)
- Good idea, but I have a suggestion. No English words end in "ivly" or "avly". This:
<Typo word="-(a/i)vely" find="(a|i)vly\b" replace="$1vely" />
-
- catches your "-ively" words and over a thousand more. I went ahead and added this and a few others under New Additions; I'll let them cook for a while to see if any unforeseen problems arise before deleting any existing entries.--BillFlis 10:29, 26 March 2007 (UTC)
[edit] 'infinate' fixed to 'infinit'
The typo correction ((In)De/In/Af)Finite fixes 'infinate' to 'infinit'. I'm not competent enough with regex to fix it. Rjwilmsi 19:16, 26 March 2007 (UTC)
- Fixed, but I had to take out the case of "infinity".--BillFlis 19:33, 26 March 2007 (UTC)
- Thanks. And another: ballon can't be corrected to balloon as 'ballon' exists in French and is quoted e.g. Ballon D'or in the Roberto Baggio article.
- That sounds questionable since this is the English Wikipedia. That's one that would need to be rejected manually by the WP:AWB user but shouldn't be removed from the typo list. (My opinion anyway). —Wknight94 (talk) 21:21, 26 March 2007 (UTC)
- Yes, but if you search for "ballon", you get not just Ballon D'Or but a host of articles with that word in the title. On the other hand, we could certainly keep the corrections of "balloning", "ballonist", etc. On the third hand, there aren't a lot of these errors.--BillFlis 10:24, 28 March 2007 (UTC)
- That sounds questionable since this is the English Wikipedia. That's one that would need to be rejected manually by the WP:AWB user but shouldn't be removed from the typo list. (My opinion anyway). —Wknight94 (talk) 21:21, 26 March 2007 (UTC)
- Thanks. And another: ballon can't be corrected to balloon as 'ballon' exists in French and is quoted e.g. Ballon D'or in the Roberto Baggio article.
[edit] 'responsable(s)' fix needs to be removed
Responsable(s) exists in French so needs to be removed from the "(Ir)Responsible" correction. Rjwilmsi 20:27, 27 March 2007 (UTC)
tPA is corected to TPa but it's correct in articles such as Serpin. Rjwilmsi 20:37, 27 March 2007 (UTC)
- Sorry to push back again (as I did above) but this is the English Wikipedia. Shouldn't French words be occurring very very rarely? To me, that's better to cover as an exception by the WP:AWB user (which is what this list is for). —Wknight94 (talk) 22:03, 27 March 2007 (UTC)
- While, I tend to agree, the RETF project page does state that the "lofty goal of RETF is to be completely automatic. That is, 100% accuracy." So something's got to give. We can't really have it both ways. I have a couple of ideas that I'm going to propose soon to alleviate this. --Thiseye 04:27, 28 March 2007 (UTC)
For phrases in a language other than English, use {{lang}} for the phrase, for example {{lang|fr|Responsable}}, where the second parameter is the ISO 639 code. It stops AWB changing the text, but I'm not sure about WikEd (if not, it probably should). mattbr 10:53, 28 March 2007 (UTC)
- Thanks. That's a really useful tip I didn't know about. I'll probably go through and tag all French 'responsable's like that. Rjwilmsi 17:25, 28 March 2007 (UTC)
[edit] Typica
Typica exists (in English!) but is corrected to Typical. Wasn't sure how to fix the regex myself. Rjwilmsi 07:03, 28 March 2007 (UTC)
- I have removed the regex doing this ((A)Typically). Other changes in he removed regex appear to already be covered in (A)Typical, but someone please update it not. Thanks, mattbr 10:53, 28 March 2007 (UTC)
Another: In (fact/the/a/an) corrects the name Ina
- Removed "ina" and "inan" from regex because of name false positives. I'd also be concerned "inan" would be a typo of "inane". --Thiseye 01:24, 29 March 2007 (UTC)
[edit] Nation name capitalization
What do folks think about taking out some of the capitalizations since there are so many animal species that use lower-case versions of words that would ordinarily be upper-case (see this edit for an example of the mistakes that are often made). —Wknight94 (talk) 22:03, 27 March 2007 (UTC)
- "gum arabic" too. -- Euchiasmus 20:17, 7 April 2007 (UTC)
[edit] Millenium Hall
Proposing to remove "Millennium_" since there is a well-known 18th century book, Millenium Hall. —Wknight94 (talk) 00:06, 30 March 2007 (UTC)
There's a band called 'Agression', so the 'agression' -> aggression fix needs to be edited. Rjwilmsi 06:24, 31 March 2007 (UTC)
[edit] Official
There is currently an entry for Official, but I'm not sure if it corrects "Offical" --> "Official". Can someone either please add this or let me know that it is in there already? --After Midnight 0001 05:09, 1 April 2007 (UTC)
- I added that case, as well as a couple more word endings.--BillFlis 11:17, 1 April 2007 (UTC)
[edit] .coms
I couldn't get negative lookahead to work properly on the .com's (OK, brainfart Harvard would be .edu anyway). Try 1 and Try 2. I'm trying to get it to ignore URLs and emails (ex NSAKEY). Can somebody take a peek? I was reloading the file with click/unclick of the RETF option. — RevRagnarok Talk Contrib 17:40, 1 April 2007 (UTC)
- AWB ignores external http: links (and from the next release https:, ftp: and mailto:), so these shouldn't be a problem. In regular text, I can't think of a situation where you would write a web or email address outside a link. Could you point me to where you are having the problem? You can try out a regex using the find-and-replace option in AWB, and I don't think clicking/unclicking the checkbox reloads the list, but you can from the last option on the 'General' menu. mattbr 18:12, 1 April 2007 (UTC)
-
- The developers told me click/unclick reloads and that seems to work. The test article is listed above - NSAKEY has the public key for an email @microsoft.com. — RevRagnarok Talk Contrib 18:18, 1 April 2007 (UTC)
-
-
-
- That fixes this case, but on a side note, I'd like to know why the regex didn't work. — RevRagnarok Talk Contrib 18:35, 1 April 2007 (UTC)
-
-
- Ticking and unticking the box just enables and disables it, it doesnt refresh the typo list. I've just commited a change that if you use the option on the general menu, it will reload them. Reedy Boy 18:41, 1 April 2007 (UTC)
-
- Two weeks ago you said it did reload the typo page. Guess there was a misunderstanding somewhere. Either way, I < pre> tagged the one spot anyway per Matt. — RevRagnarok Talk Contrib 18:52, 1 April 2007 (UTC)
- Sorry about that, i thought (as it was a bit of a quick fix), that it did. When i looked over the code just now, i realised, that unless the decleration for the typo's was blank (ie = null), it wouldnt load them. I've now put a parameter on that, so that you can force reload, and that works. Sorry for the confusion/lack of complete attention on my part, and for the next release, it definately has been sorted!! Reedy Boy 19:01, 1 April 2007 (UTC)
Re the regex, sorry bit of a regex novice. Can anyone else help? mattbr 18:50, 1 April 2007 (UTC)
[edit] august > August
Since august is a word, should this correction be removed, or improved to fix <number> august > <number> August only? Rjwilmsi 17:53, 3 April 2007 (UTC)
- Good point. Probably, but I was having some problems with lookahead in the past (see above). — RevRagnarok Talk Contrib 18:10, 3 April 2007 (UTC)
[edit] discribed -> described
As in [2]? Jobjörn (Talk ° contribs) 12:06, 4 April 2007 (UTC)
- Added to "Describe", which is now "(De/Pre)scribe".--BillFlis 19:49, 4 April 2007 (UTC)
[edit] strengtened > strengthened
as here. Jobjörn (Talk ° contribs) 14:16, 4 April 2007 (UTC)
- Added to "Strength".--BillFlis 19:43, 4 April 2007 (UTC)
[edit] "significatly" --> "significately" ???
The rule <Typo word="-(b/c/d/g/i/m/s/t/v)ately_" find="([bcdgimstv])atly\b" replace="$1ately" /> converts significatly to significately.
Surely that can't be what the inventor intended?
--Euchiasmus 20:13, 7 April 2007 (UTC)
-
- I added this case to the existing rule for "Significant", and moved the general rules to the end, so this will be treated as a special case before the general rules kick in.--BillFlis 19:39, 9 April 2007 (UTC)
[edit] "distictively" --> "districtively" ???
The word "districtively" doesn't even exist.
Let's have rules that rectify a recognised and bounded set of incorrect words, rather than trying to make the rules too general. What do you think? Euchiasmus 20:30, 7 April 2007 (UTC)
- Agreed as your other significatly example demonstrates. —Wknight94 (talk) 21:19, 7 April 2007 (UTC)
- As the "inventor" of these attempts at general rules, may I ask, what is the harm in replacing one type of error by another? If you did not have the general rule, you would still leave an error. At least its presence in this case alerted you that we need separate rules for these exceptional misspellings. I'll add a rule for "(Di/In)stinctive" to handle your clever discovery!--BillFlis 19:11, 9 April 2007 (UTC)
- It turns out that there was an existing rule to handle "distictively" but it was down in the D's, behind the general rules. I've now moved the general rules to the end, to allow the special cases to be handled first. I also modified the previous "Distinction" to "(Di/In)stinctive".--BillFlis 19:17, 9 April 2007 (UTC)
- As the "inventor" of these attempts at general rules, may I ask, what is the harm in replacing one type of error by another? If you did not have the general rule, you would still leave an error. At least its presence in this case alerted you that we need separate rules for these exceptional misspellings. I'll add a rule for "(Di/In)stinctive" to handle your clever discovery!--BillFlis 19:11, 9 April 2007 (UTC)
[edit] "other than"
The regexp for other than would change "Will have to agree with each other then convince the rest." Another regexp to change "(another|(?:the|each|some)) other then" to "$1 other, then" first and then apply the "then" to "than" fix would avoid it. This could also be extended to handle "better then" and "worse then". Or the line could be removed. Is it too much processing for these cases? -- JHunterJ 11:31, 13 April 2007 (UTC)
[edit] KPA vs. kPa
KPA is being changed to kPa because a rule in the Wikipedia:AutoWikiBrowser/Typos#Abbreviations of SI units section is too general. (I've run into other problems with the SI units but I haven't seen them in a while. I'll bring them up next time I see them.) Can we do a (k[pP][Aa]|[Kk][pP]a|KpA) rule instead? —Wknight94 (talk) 21:58, 15 April 2007 (UTC)
- Unrelated: Canarian Black Oystercatcher subspecies' scientific name is "Haematopus niger meade-waldoi" but "niger" gets changed to "Niger". —Wknight94 (talk) 02:31, 16 April 2007 (UTC)
[edit] Easter
I have found a false positive for the capitalization of "easter" and that is "easter egg" in the sense of Easter egg (virtual). After looking at the what links here page, there are between 250 and 500 links to that page, so there are a fair number of instances of this false positive out there. I personally cannot think of a way to alter the rule to fix this problem. I will be removing the rule. If anyone can think of a way to fix it, feel free to add it back. For future reference, this was the rule: <Typo word="Easter" find="\beaster\b" replace="Easter" /> --Maelnuneb (Talk) 17:07, 17 April 2007 (UTC)
- According to Easter egg (virtual), that usage is capitalized as well. I don't see the problem. -- JHunterJ 17:14, 17 April 2007 (UTC)
- As an aside, if it had been needed, I believe
<Typo word="Easter" find="\beaster(?! egg)\b" replace="Easter" />
- would have accomplished the exception-handling desired. -- JHunterJ 17:29, 17 April 2007 (UTC)
-
-
- Meh. Inconsistency in that article is the problem to fix first, IMO. Which I've done, just now. I'd still say the original rule should be restored here, but I'll see if someone else agrees. -- JHunterJ 21:09, 17 April 2007 (UTC)
-
[edit] "Comprised of" rule
I'm a little confused about these rules:
<Typo word="comprises" find="\bis comprised (?:up )?of\b" replace="comprises" /> <Typo word="comprise" find="\bare comprised (?:up )?of\b" replace="comprise" /> <Typo word="comprised" find="\b(?:was|were|been) comprised (?:up )?of\b" replace="comprised" /> <Typo word="comprising" find="\b([Cc])omprised (?:up )?of\b" replace="$1omprising" />
Could somebody with a little more English grammar knowledge please explain these. I don't remember there being a problem with "comprised of" but I could be wrong. --Maelnuneb (Talk) 16:33, 19 April 2007 (UTC)
If X, Y, and Z compose a thing (or a thing is composed of X, Y, and Z), that thing comprises X, Y, and Z. See wikt:comprise -- JHunterJ 16:58, 19 April 2007 (UTC)
- Thanks for looking that one up. One would think that I would have known to look there before asking questions, but apparently not. --Maelnuneb (Talk) 17:07, 19 April 2007 (UTC)
-
- There was an earlier question about this, which I answered on the RegExTypoFix talk page under the heading Urgh!! -- Euchiasmus 21:26, 19 April 2007 (UTC)
It has been suggested on my Talk that the replacement for "is comprised of" be "is composed of" instead of "comprises". I tend to prefer keeping the base word, and as a bonus I like active voice over passive voice. Any other suggestions or agreements with either choice? -- JHunterJ 11:01, 26 April 2007 (UTC)
[edit] Significantion?
signification -> significantion? Looked weird to me so I didn't save the change. --Guinnog 23:04, 26 April 2007 (UTC)
- It was wrong. The pattern for "significant" was lacking the word boundaries. Fixed. -- JHunterJ 23:22, 26 April 2007 (UTC)
- Wow, that was fast! Thank you. --Guinnog 23:26, 26 April 2007 (UTC)
[edit] Distinguish
I think this is wrong:
replace="$1istinguis$2" />
Shouldn't it be
replace="$1istinguish$2" />
It seem to be changing 'distinguish' to 'distinguis'. Colonies Chris 13:14, 28 April 2007 (UTC)
- It looks like this has been fixed. --Thiseye 01:26, 30 April 2007 (UTC)
[edit] Turks
I think I just found a minor regexp bug while editing this revision of self-loading rifle. The suggested edit for "turks" was "Turks$4" (ie. with the variable in the string). Cheers, -- Seed 2.0 14:51, 1 May 2007 (UTC)
[edit] Question
Sorry I don't know much about programming or anything, but I'm guessing that we should copy the codes on the page somewhere on our AWB so that it can fix the mistakes when we're using the program, right? I was wondering how we could do that, like where & how do we copy all the typo codes in the program to "make it work" if you see what I mean... Thanks in advance. Zouavman Le Zouave (Talk to me!) 13:06, 3 May 2007 (UTC)
- Nope, just set the option and it will be "on" -- AWB reads it from this article itself. -- JHunterJ 13:12, 3 May 2007 (UTC)
Thanks a lot for such a fast answer! ^^ Zouavman Le Zouave (Talk to me!) 13:14, 3 May 2007 (UTC)
[edit] comprised of
AWB changes comprised of to composed of. This is not a typo--one of the meanings of comprised is "to constitue, to make up, to compose", or, pass "to be composed of, to consist of". Could someone explain why AWB is effectively making a word choice change under the guise of fixing a typo? Miss Mondegreen talk 00:42, 4 May 2007
- One of the meanings of "comprised" is as you say. "Comprised of" is informal or incorrect though (see wikt:comprise and my Talk page), and should be changed either to "comprising" or "composed of". I added it as "comprising", but received some complaints that that was hard to understand, so I switched it to "composed of". I certainly don't mind switching it back, and will do so now. -- JHunterJ 11:02, 4 May 2007 (UTC)
-
- The day I turn to wikitionary as a dictionary is the day I...well, I don't know what, but something as drastic as hell freezing and pigs flying but less cliche.
- Considering that we aren't supposed to site Wikipedia or other wiki sites as references, I'll site the OED:
-
8. Of things: a. To take up, fully occupy (a space). Obs. rare.
b. To constitute, make up, compose.
c. pass. To be composed of, to consist of. - "Comprised of" is not incorrect, nor is it listed as informal.
-
-
- "Comprised of" isn't listed at all in that excerpt. If "comprise" means "to be composed of", "comprised of" therefore means "to be composed of of", which is why it's wrong (or at best informal). Does the OED not have any usage information for "comprise"? Merriam Webster does [3], as do the American Heritage Dictionary [4], Bartleby's [5], and the Random House Word of the Day [6]. -- JHunterJ 12:33, 4 May 2007 (UTC)
-
-
- There does seem to be a lot of fervor about this usage. Googling not only gets a variety of definitions that do or don't include the usage, it also yields a number of grammar junkies lecturing on it. This may mainly come from the fact that it is a fairly recent form of the word. The first known occurance comes a century after "comprising" (same meaning), and didn't become well used until halfway through the twentieth century. Regardless, it's both correct, and not listed in the dictionary as being an informal usage. American Heritage alludes to it, but the definition there covers scarely a third of the usages and meanings of the word that the OED covers. I've removed comprised altogether, and unless I've done so correctly or there's an issue with the source I provided, it should stay that way.
-
- I also think it's a bad idea for word changes, changing incorrect word usage to be under the guise of typo fixing. Even if the change is absolutely correct, the AWB edit summary reflects a typo change, unless the edit summary is manually changed. I'm also concerned that apparantly, definitions and usages for words are being obtained from wiktionary, or at best, online dictionaries that do not list complete definitions and usages. Is it too much to ask that people actually go and look the word up in a comprehensive dictionary before making an edit that sets in motion changes throughout Wikipedia? Changes that cannot necessarily be easily undone. This seems to me to be the height of irresponsibility. Miss Mondegreen talk 05:00, 4 May 2007 (UTC)
-
-
- Well, it's certainly turned out to be contentious, so I don't object to its removal, but I did look it up in other dictionaries first (although I don't have ready access to an OED), so please don't cast the move as irresponsible or hasty. I choose to point to the Wiktionary definition first because it is, like this, a Wikimedia project. -- JHunterJ 12:33, 4 May 2007 (UTC)
-
-
-
- One thing that I am 100% sure of is that this change was made with the best intentions and with a lot of debate. This rule has been talked about a lot recently. There are 2 sections now on this page, one of which I personally started, one on JHunterJ's talk page, and one on this page. As far as I am concerned, this rule has been pretty well defended. JHunterJ's talk page and the talk page for RegExTypoFix have more complete information to look at. Please make sure to look at these pages. --Maelnuneb (Talk) 16:55, 4 May 2007 (UTC)
-
-
-
-
- I saw the debate on the talk page. And I understand that it was made with good intentions. However, I'm still concerned. Wiktionary, wikipedia, wikimedia projects are NOT acceptable sources per WP:V, WP:RS etc. An issue with the change was raised, and no one really had the answer. Information was taken from Wiktionary, both incomplete and not ok due to policy, and from a seriously incomplete American Heritage entry. I understand not having access to sources, but if you don't have access to information, then don't make changes based on something you can't site or prove. All the discussions show me is that people fought about something factual without using verified facts; no one pulled out a complete dictionary until I came to the discussion. And that's just irresponsbile. If this was important to you, to any of the people who believed in this change, if you honestly thought that it was incorrect and AWB should be fixing it, then you really needed to get access to a dictionary somehow. I find it very hard to believe that none of you could have gone to a library, or at least hunted down a fellow wiki user with access to something like the OED. Look if someone creates userboxes for access to online databases, I'll be the first person to put them on my userpage and field requests. Half of the articles I do get involved in are because I do have access and I show up to stop a "he said she said" argument about something factual, where all someone needs to do is go and look it up.
- But aside from the particulars of this case, I'm concerned in general. Pretend that this is right. Who on earth thinks that replacing a misused word is fixing a typo? Changing a word, misused or no is always going to have subtleties that you can't program into code. The people who use AWB move a mile a minute, and while they check the proposed edits to make sure that they make sense, you're asking them to know a fair amount to catch stuff like this. And I'm betting that when they don't see them not making sense, they just let it go ahead. And that's a problem. Because now you have a machine correcting grammar based on programming by users who don't always use dictionaries when programming that machine, and that's a really bad kind of self-correcting wiki.
- Having a mistake in AWB can't really be undone. How many edits get performed with the error before it's caught? How many correct spellings of "dependant" were changed to a different spelling, and therefore different meaning by AWB and had to, or still have to be caught by hand? That's not saying that AWB doesn't serve a good purpose, but if word changes are an even greater liability and if they are going to continue to be considered typos, they should at least be kept in a seperate section, so that a closer eye can be kept on them. Miss Mondegreen talk 14:49, 5 May 2007 (UTC)
- Please quote the OED passage that uses the phrase "comprised of" as acceptable in other than informal usage before continuing with the idea of the "mistake" done by AWB. I've "really" had an answer for each objection raised so far. -- JHunterJ 22:02, 5 May 2007 (UTC)
-
-
[restarting indent] I was not using an OED passage to rebut the American Heritage passage. I was using the OED definition. The OED will list the definition as informal or slang or archaic, if it is in fact informal or slang or archaic, and you can see above that definition 8 was archaic. The definition and usage I was referring to had no such listing--the OED does not list it as any of these things. I'm including the quotes, and spelling and etmomolgy, and everything for the defintion and usage that is being discussed here. Then, you'll have everything I have. Miss Mondegreen talk 16:41, 5 May 2007 (UTC)
(k{schwa}m{sm}pra{shti}z) Also 5-7 compryse, 5 Sc. compris, 7-9 comprize. [f. F. comprendre (pa. pple. and pret. Ind. compris):{em}L. comprend{ebreve}re, contr. from comprehend{ebreve}re to COMPREHEND. Probably formed by association with emprise, and possibly with enterprise, both of which verbs were derivatives from Eng. ns. of the same form (repr. F. emprise, entreprise, fem. ns. from pa. pple.), but being used as the Eng. reprs. of emprendre, entreprendre, formed a precedent for the analogous representation of other compounds of -prendre by verbs. in -prise: cf. apprise, surprise. (Many of the early passages in which this word occurs are so vague that it is difficult to gather the exact sense.)]
b. To constitute, make up, compose.
1794 G. ADAMS Nat. & Exp. Philos. II. xvi. 238 The wheels and pinions comprizing the wheel-work. 1794 PALEY Evid. I. ix. (1817) 169 The propositions which comprise the several heads of our testimony. 1850 W. S. HARRIS Rudimentary Magnetism iv. 73 These substances which we have termed diamagnetic..and which comprise a very extensive class of bodies. 1907 H. E. SANTEE Anat. Brain & Spinal Cord (1908) iii. 237 The fibres comprising the zonal layer have four sources of origin. 1925 Brit. Jrnl. Radiology XXX. 148 The various fuses etc. comprising the circuit. 1950 M. PEAKE Gormenghast (1968) xiv. 94 Who, by the way, do comprise the Staff these latter days? 1959 Chambers's Encycl. XIII. 653/1 These fibres also comprise the main element in scar tissue. 1969 W. HOOPER in C. S. Lewis Sel. Lit. Ess. p. xix, These essays together with those contained in this volume comprise the total of C. S. Lewis's essays on literature. 1969 N. PERRIN Dr. Bowdler's Legacy (1970) i. 20 As to who comprised this new reading public, Jeffrey..guessed in 1812 that there were 20,000 upper-class readers in Great Britain.
c. pass. To be composed of, to consist of.
1874 Art of Paper-Making ii. 10 Thirds, or Mixed, are comprised of either or both of the above. 1928 Daily Tel. 17 July 10/7 The voluntary boards of management, comprised..of very zealous and able laymen. 1964 E. PALMER tr. Martinet's Elem. Gen. Ling. i. 28 Many of these words are comprised of monemes. 1970 Nature 27 June 1206/2 Internally, the chloroplast is comprised of a system of flattened membrane sacs.
9. The participles are used absolutely: = Including, included (cf. F. y compris); so the gerund.
1653 H. COGAN tr. Pinto's Trav. vii. 21 He had lost above three thousand and five hundred men, not comprising the wounded. 1663 GERBIER Counsel 37 One quarter of the Ionick Column, the Base and Capital comprised. Ibid. 56 Brick-layers will work..the inside for thirty three shillings, arches comprised. 1887 W. G. PALGRAVE Ulysses, Phra Bat, The edifice..is square, about thirty feet in dimension each way, without comprising the outer colonnade.
Hence com{sm}prised ppl. a., com{sm}prising vbl. n. and ppl. a.
c1575 SIR J. BALFOUR Practicks (1754) 147 Redemptioun of comprysit landis. Marg. Difference betwix comprysit landis and wodset landis. 1603 FLORIO Montaigne (1634) 295 If he be in himselfe, they are also two, the comprizing and the comprized. 1609 SKENE Reg. Maj. 110 Comprisings of lands. 1691 E. TAYLOR tr. Behmen 316 Which breaketh the comprized Life again. 1879 SIR G. SCOTT Lect. Archit. I. 229 The subdivisions..three or four under one comprising arch.
[edit] Other rules
- Thanks. It's the "c" definition above that I was looking for. I'm surprised to learn that it doesn't address the usage question that arises in the other sources. And I am happy to have removed the rule that was replacing a form of comprise with a form of compose. Just to be sure, are you objecting to the other rules (is comprised of -> comprises, etc) or no? I'd still like to replace them, even if both are correct according to the OED, under the "Try to find words that are common to all" part of the style guidelines, but if they're also at issue, they should be removed as well. -- JHunterJ 00:59, 6 May 2007 (UTC)
-
- I'm a little suprised to, but I find time and time again when an issue arises that the OED is so much more complete than other sources that I just go back to it. I suspect that "comprised of" is regarded sometimes as informal because it came into existence later--a whole century after comprises. And it's not like there was no other way to say "comprised of"--there were a few other ways to say it just with the word comprised alone, and in this meaning comprised is practically a synonym for composed, so the usage most likely didn't become integrated into the language quickly the way that other usages and words do when there is a need for them to. However, it's not listed as informal by the OED, the only dictionary I've found to actually list all of the definitions and usages, and I read the stuff you linked to, and the way that the issue is written about seems to be of historical note, though I agree--there are always going to be people who prefer one usage over another an enforce that wherever they can.
- My issue with the other rules is that I'd prefer not to mess with people's grammar or writing. I assume that you're referring to Wikipedia:Manual of Style#National varieties of English? Maybe I'm being completely dense, but I really fail to see how on earth that applies to this at all. Can you explain? The thing is at this point, with the remaining rules that you're referring to, is that both are correct, in most instances (unlike spelling, I won't say all). But fixing with AWB could potentially fix something that was correct to something that isn't, or something that read nicely to something that sounds really clumsy because of the sentence structure. Each article is written by different people and they're going to have slightly different tones and be written in different fashions and I think that switching wording like that is a bad idea. There is only a certain extent to which you can copy-edit blindly--there is an art to editing, and it can't be done with an automated browser. Miss Mondegreen talk 02:54, May 6 2007
- The "national varieties" reads to cover variations in usage national and otherwise, and this seems to fit its description, if not its heading. While I have come across replacements that would have been wrong to use "comprised of" -> "comprising", I haven't yet found any that would be rendered incorrect by the other rules, "is comprised of" -> "comprises", etc., and I don't think there would be any. Could there be? -- JHunterJ 12:04, 6 May 2007 (UTC)
- Sure. "Manjung's land area is predominantly comprised of agricultural land" That's actually the change that brought this to my attention. This is why I'm so against fixing grammar automatically--it's hard enough for human to do. English grammar is complex, obscure, complicated and bizarre--humans have immense difficulty with it. I'm not sure it can be programmed--what absolutes are there? And even then, the programming is dependent on the rest of the article being correct, which is ironic, since it's meant to fix errors. Maybe the minor grammar error that AWB detects and attempts to fix is really a grammar edit elsewhere, but it triggers that phrase that AWB is programmed with. In terms of grammar and word usage, phrases and sentences and paragraphes have to be looked at as ever increasing wholes, until you get to the article as a whole. I just don't think that this is possible. Miss Mondegreen talk 13:05, May 6 2007
- That was a change of "comprised of" to "composed of", and would be eliminated by the elimination of the "comprised of" rule. Is there a potential problem with "is comprised of" -> "comprises"? -- JHunterJ 17:26, 6 May 2007 (UTC)
- Sure. "Manjung's land area is predominantly comprised of agricultural land" That's actually the change that brought this to my attention. This is why I'm so against fixing grammar automatically--it's hard enough for human to do. English grammar is complex, obscure, complicated and bizarre--humans have immense difficulty with it. I'm not sure it can be programmed--what absolutes are there? And even then, the programming is dependent on the rest of the article being correct, which is ironic, since it's meant to fix errors. Maybe the minor grammar error that AWB detects and attempts to fix is really a grammar edit elsewhere, but it triggers that phrase that AWB is programmed with. In terms of grammar and word usage, phrases and sentences and paragraphes have to be looked at as ever increasing wholes, until you get to the article as a whole. I just don't think that this is possible. Miss Mondegreen talk 13:05, May 6 2007
- The "national varieties" reads to cover variations in usage national and otherwise, and this seems to fit its description, if not its heading. While I have come across replacements that would have been wrong to use "comprised of" -> "comprising", I haven't yet found any that would be rendered incorrect by the other rules, "is comprised of" -> "comprises", etc., and I don't think there would be any. Could there be? -- JHunterJ 12:04, 6 May 2007 (UTC)
-
-
-
-
-
- Ooh, sorry, I misread that. Uhh...I'm trying examples in my head. I'm not sure if it makes it incorrect, but there are certainly cases where it makes it clumsy, though I'll admit that the wording I'm using to begin with is clumsy already. For example, "a fruit salad is comprised of apples, oranges and grapes" -- "a fruit salad comprises apples, oranges and grapes" -- "a fruit salad is composed of apples, oranges and grapes".
- Now really, I wouldn't user any of these wordings, but composed of and comprised of are best, and comprises is just awful here, though it may be technically correct. But everything I said before, with the wrong example about not wanting to correct grammar with AWB still stands, and it will stand for every instance. English grammar is ridiculously complex and there are so many ifs and ors and buts and we use different spellings and dialects and there are so many variables that I can't see a machine doing this by absolutes, when it is so hard for humans to do this with each individual scenario. Do you really think that AWB can work with grammar the way it does with spelling? Miss Mondegreen talk 21:15, May 6 2007
- Well, in that example, "comprises" and "is composed of" are best to my ear. I don't think substituting "comprises" for "is comprised of" reaches the level of grammar fixing, any more than replacing "I ain't" with "I am not" would. It's still just a rote copy edit. (I can go on like this all day, and wouldn't mind the exchange. If you're still not swayed, though, you can edit the list to remove them, or say so here and I'll remove them.) :-) -- JHunterJ 11:08, 7 May 2007 (UTC)
- Hmmm, then it's clearly some people are familiar with some usages, because to me, comprises sounds painful there, even though technically, I know.... I don't think it should be in the list though, because since all are technically correct and what you are or are not familiar with is closer to a dialect issue than a grammar issue since they are all right, and AWB definitely shouldn't correct for that. Could you remove it? I'm sure I could, but it's code I'm really not familiar with and I noticed you fixed my removal last time.
- By the way, I was serious about the whole userbox thing before. I don't know if anyone is interested in making them, but if so, let me know. Miss Mondegreen talk 10:40, May 8 2007
- Well, in that example, "comprises" and "is composed of" are best to my ear. I don't think substituting "comprises" for "is comprised of" reaches the level of grammar fixing, any more than replacing "I ain't" with "I am not" would. It's still just a rote copy edit. (I can go on like this all day, and wouldn't mind the exchange. If you're still not swayed, though, you can edit the list to remove them, or say so here and I'll remove them.) :-) -- JHunterJ 11:08, 7 May 2007 (UTC)
-
-
-
-
[edit] Capitalization of state names
I just noticed that we seem to have a rule to %s/georgia/Georgia/gcI but not for other states. I haven't gone through the regexp list but we're at least missing the Carolinas and from the looks of it a few other states. *insert semi-obscure Friends quote about getting 56 states here* ;). -- Seed 2.0 01:35, 5 May 2007 (UTC)
- You must mean "state names of the United States of America", whereas the Georgia you found is a state of the former Soviet Union. Since we have that, we don't need to duplicate it in the long-but-incomplete list of Geographical Place Names of the United States.--BillFlis 12:01, 5 May 2007 (UTC)
[edit] Mineral, suggestion
miniral -> mineral, came across it the other day. Pax:Vobiscum 22:51, 9 May 2007 (UTC)
[edit] Stratagy -> stratey?
Should go to strategy, of course. I don't know regexes well so I can't really fix it myself. —Dark•Shikari[T] 13:51, 10 May 2007 (UTC)
Also directer -> director should be added. —Dark•Shikari[T] 21:20, 10 May 2007 (UTC)
[edit] efectiv -> effectiveive
Just a quick heads up. I just noticed that the suggested fix for the 'efectiv' on Silver Nanoparticles was 'effectiveive' and figured that I'd rather just report it than mess with the regexp myself. -- Seed 2.0 10:39, 17 May 2007 (UTC)
[edit] out added as a prefix to {{infobox}}
Can someone explain why AWB would have made this change? Miss Mondegreen talk 09:02, May 18 2007
- I think that's going to be user error. The cursor starts in the upper left, and he may ahve not realized that he was typing in the AWB window. Note the edit summaries in this sequence:
- 13:59, 12 May 2007 (hist) (diff) One Piece Grand Battle! (Typo fixing, Typos fixed: american → American, english → English, using AWB) (top)
- 13:59, 12 May 2007 (hist) (diff) InuYasha the Movie: Fire on the Mystic Island (Typo fixing using AWB)
- 13:58, 12 May 2007 (hist) (diff) Yotsuya Kaidan (Typo fixing, Typos fixed: the the → the, using AWB) (top)
If the user enters text manually, he loses the "Typos fixed:" portion of the automatic edit summary. -- JHunterJ 10:55, 18 May 2007 (UTC)
[edit] Leftfield
What should be done about regexes that are likely to generate false positives? I mean specifically this one:
<Typo word="(Center/Left/Right) field" find="\b([Cc]enter|[Ll]eft|[Rr]ight)f(?:ie|ei)ld(|ers?)\b" replace="$1 field$2" />
It changes "leftfield" to "left field" which is problematic in case of the Leftfield duo. Jogers (talk) 11:44, 22 May 2007 (UTC)
- In the case where the false positive is a proper noun, just remove the relevant capital letter:
-
<Typo word="(Center/Left/Right) field" find="\b([Cc]enter|left|[Rr]ight)f(?:ie|ei)ld(|ers?)\b" replace="$1 field$2" />
- That will remove the false positives and some of the real positives, which can be added back in as a separate rule:
-
<Typo word="Left field" find="\bLeftf(?:eild|ield(ers?))\b" replace="Left field$1" />
- (untested). -- JHunterJ 12:14, 22 May 2007 (UTC)
[edit] francophone --> Francophone and anglophone --> Anglophone
I was advised by another user that the capitalisation of these words and their derivatives is not used in all variants of English - see WP:CAPITAL#Anglo-_and_similar_prefixes. Therefore I think it would be appropriate to remove / comment out these corrections. Opinions? Rjwilmsi 01:21, 2 June 2007 (UTC)
- Just the "-one" section? Yes, I think that would be definitely be appropriate. I think commenting out the "-ile" and "-obe" entries would also be appropriate, since they should remain lowercase on Canada-related articles. -- JHunterJ 11:01, 2 June 2007 (UTC)
[edit] Problem with "operational" typo fix
My AWB just replaced "opperational" with "operationional" here, so I think the regex could use a second look. TomTheHand 15:29, 4 June 2007 (UTC)
- Thanks. I adjusted it. -- JHunterJ 15:34, 4 June 2007 (UTC)
[edit] Duplicated words
I collapsed the duplicated words into one entry. It could be made even more generic:
<Type word="Duplicated words" find="\b(\w+)\b\s+\1\b" replace="$1" />
but that'll have more false positives. If you want to be careful with it, add it explicitly to your personal Find & Replace section in AWB. -- JHunterJ 00:18, 10 June 2007 (UTC)
- I think your elegant rule is a good contribution, but it doesn't work when the first of the duplicated words is capitalized, as at the beginning of a sentence, which the old clumsy rules were able to deal with. I don't see how to handle all those cases in a general rule.--BillFlis 00:55, 10 June 2007 (UTC)
- The rule as written just fixed By by -> by here. -- JHunterJ 00:59, 10 June 2007 (UTC)
- Of course, that was in the AWB Find & Replace section, not in the Typos, so maybe it behaves differently in the Typo list. -- JHunterJ 01:00, 10 June 2007 (UTC)
- Ah, if that's the case, as I see it is, it seems that AWB is using a very non-standard type of regular expressions!--BillFlis 01:49, 10 June 2007 (UTC)
- Of course, that was in the AWB Find & Replace section, not in the Typos, so maybe it behaves differently in the Typo list. -- JHunterJ 01:00, 10 June 2007 (UTC)
- The rule as written just fixed By by -> by here. -- JHunterJ 00:59, 10 June 2007 (UTC)
In my experience of using the duplicate words rules so far, if we only correct lowercase entries there are fewer false positives (say hardly any compared to a few), so perhaps it's better than separate rules for each word. I agree that the above generic line is far too broad for inclusion in the typo list (just consider 'had had', 'in in'), but is useful for very careful use by an individual. Rjwilmsi 07:48, 10 June 2007 (UTC)
- BTW, I found the case-insensitive solution:
- <Type word="Duplicated words" find="\b(?i:(\w+)\b\s+\1)\b" replace="$1" />
- but I'll just leave it here based on Rjwilmsi's note. -- JHunterJ 16:59, 26 June 2007 (UTC)
[edit] Using the ?: part
If you need to use parentheses for grouping but not for capturing, it's a good idea to use the (?:blah|yadda) form. This allows subsequent capturing parentheses to be accessible in order ($1 and $2 instead of $1 and $3). Even if there are not subsequent capturing parentheses in the regexp, it's a good idea because it (a) alerts future readers/maintainers that the group is not used in the replacement and (b) it allows for a future editor to add a trailing capture without having to figure out what number it is -- the next $x number can be assumed. In my opinion; that's how I do it in my non-Wikipedia programming. -- JHunterJ 22:39, 18 June 2007 (UTC)
[edit] Febuary ->> February
A typo I usually do, Febuary ->> February
37 Pages have that typo.
-Flubeca (t) 16:31, 23 June 2007 (UTC)
- Thanks, we've already got that one listed as a correction. I'll do a search for it later today to correct any articles containing it. Rjwilmsi 16:17, 24 June 2007 (UTC)
- Update: corrected two more articles. I ran the correction about a month ago using a Google search and got most of them. We'll need to wait for the Google cache to reparse the pages before a Google search is clean (mainspace articles only). Rjwilmsi 21:03, 24 June 2007 (UTC)
[edit] Affluent (false positive)
Affluent should NOT correct to Afluent.
Affluent - being rich and wealthy --Breno talk 14:22, 27 June 2007 (UTC)
- Fixed. -- JHunterJ 18:31, 27 June 2007 (UTC)
[edit] Intension
I suppose that intension should not be changed to intention. Jogers (talk) 17:32, 1 July 2007 (UTC)
- Fixed. -- JHunterJ 19:09, 1 July 2007 (UTC)
[edit] Centerfield
Changing "Centerfield" to "Center field" produces false positives. Jogers (talk) 17:44, 1 July 2007 (UTC)
- Fixed. -- JHunterJ 19:09, 1 July 2007 (UTC)
[edit] Cristian → Christian
Cristian is a given name and place and shouldn't be corrected to Christian. Thanks, mattbr 19:37, 2 July 2007 (UTC)
[edit] New Jersey
One more, new jersey should not auto-capitalise.
The soccer player got his new jersey today. --Breno talk 13:18, 3 July 2007 (UTC)
- Did you actually come across that in wikipedia? It doesn't sound like a very encyclopedic sentence, and ought to be copy-edited.--BillFlis 13:31, 3 July 2007 (UTC)
- Yeah, on Australia national rugby union team. The actual quote is "The new jersey, custom-designed by Canterbury, was also designed in consultation..." I hit save on it without checking the sentence context and someone pulled me up on it. --Breno talk 12:51, 6 July 2007 (UTC)
- Tsk, that's even worse! "Custom-designed"? "Was also designed"? I've cleaned it up a bit.--BillFlis 13:17, 6 July 2007 (UTC)
- Yeah, on Australia national rugby union team. The actual quote is "The new jersey, custom-designed by Canterbury, was also designed in consultation..." I hit save on it without checking the sentence context and someone pulled me up on it. --Breno talk 12:51, 6 July 2007 (UTC)
[edit] ablilities
- Abilites & abilitis -> abilities ( i or e to ie) Harryboyles 07:36, 4 July 2007 (UTC)
[edit] Three new ones you might want to consider
- league, instead of leauge
- science, instead of sciene
- wonder, instead of woner
There aren't many (if any) on Wikipedia right now, because I fixed them all by myself before I learned of this wonderful thing known as RegexTypoFix. Before I fixed them though, there were a good number of each.
Alex 22:32, 8 July 2007 (UTC)
- I won't do "sciene" -> "Science", could be a typo of "scene" instead.
- Same "woner" -> "wonder", could be a typo of "owner"
- If I understand how it works :
<Typo word="League" find="\b(L|l)eauge\b" replace="$1eague" />
-FlubecaTalk 21:39, 10 July 2007 (UTC)
-
- "League" is a subset or special case of "(Col)League" under New Additions.--BillFlis 21:53, 10 July 2007 (UTC)
[edit] Request: Nassarawa → Nasarawa
I was directed to Wikipedia:Bot requests for requesting this typo be fixed, and from there I have been sent here. Would it be possible to add the change: "Nassarawa" → "Nasarawa"? See the first line of Nasarawa State for an explanation. Thanks! Picaroon (Talk) 19:02, 15 July 2007 (UTC)
- Not overly sure if it should be added to the list... As it wont be that commmon. Did do a wikisearch, and found 30 odd pages with it on, so im just currently using AWB to fix them for you. See Special:Contributions/Reedy Boy Reedy Boy 19:13, 15 July 2007 (UTC)
[edit] pf
"pf" should not automatically correct to "pF". It's a common misspelling of "of". --Breno talk 15:01, 19 July 2007 (UTC)
as well as a firewall and notation for "piano forte". — gogobera (talk) 20:44, 30 July 2007 (UTC)
- and abbrev of picoFarad. -- Alan Liefting talk 06:20, 25 August 2007 (UTC)
[edit] capitalization of species' names
In the Binomial nomenclature, the species name is not capitalized. I just caught a change that made a mistake because of it. I can't think of any good way to keep AWB from making this mistake. One way would be to check, whenever capitalizing a word, if the previous word is in a list of genus names. I can't say that this would be a good way, though. Just thought I'd point it out. Thanks. — gogobera (talk) 20:52, 30 July 2007 (UTC)
- One solution is to tag the Latin species names as Latin language, that way the English language typo script will ignore it e.g. use {{lang|la|Hyoscyamus niger}}. Rjwilmsi 17:35, 1 August 2007 (UTC)
That seems like the right idea, regardless of AWB issues. Any thought on how to get people doing it? — gogobera (talk) 03:18, 3 August 2007 (UTC)
[edit] diferent -> different
diferent -> different :) -- Stwalkerster talk 12:08, 3 August 2007 (UTC)
- Added as special case of "(In)Different".--BillFlis 13:03, 3 August 2007 (UTC)
[edit] supercede -> supersede
I've arguably seen the prior spelling more often (though both are valid). Wiktionary notates them as alternative but both correct spellings. Is there a policy on this, like there may or may not be for ise/ize?
[The alteration was noted on National Rugby League (2007 Season).]
- Agreed, my desktop dictionary as well as Merriam Webster online lists supercede as an accepted variant spelling "since the 17th century". It is probably not a big deal, except I am seeing several of articles where the only change is supercede to supersede. In a group of typos there may not be resistance to the change, but changing an article for a single typo which is not a typo may cause friction, given the strong feelings about article content adopted by a number of editors. Perhaps AWB might rethink the change? -- Michael Devore 18:26, 6 August 2007 (UTC)
- Interesting. I have an older dead-tree M-W (7th ed.), which has only SUPERSEDE. This online American Heritage Dict. has only SUPERSEDE too.--BillFlis 19:45, 6 August 2007 (UTC)
[edit] Typos currently not caught by AWB
I have gone over this talkpage, to check whether any of the suggested typos haven't been implemented. Here is the list of typos which are currently not recognized (together with a google count).
- likley → likely (297) - 25 corrections. Rjwilmsi 21:32, 12 August 2007 (UTC), added it now Voorlandt 19:19, 16 August 2007 (UTC)
- signiture → signature (273) - added to list & run through ~30 corrections. Rjwilmsi 21:32, 12 August 2007 (UTC)
- similarily → similarly (233) - added to list & run through. Rjwilmsi 22:05, 12 August 2007 (UTC)
- wheter → whether (186) - done Rjwilmsi 17:51, 14 August 2007 (UTC)
- literaly → literally (149) - added to list & run through. Rjwilmsi 21:32, 12 August 2007 (UTC)
- orginial → original (109) - added to list & run through. Rjwilmsi 21:49, 12 August 2007 (UTC)
- posibility → possibility (107) - added to list & run through. Rjwilmsi 21:49, 12 August 2007 (UTC)
- responed → responded (100) - added to list & run through. Rjwilmsi 17:51, 14 August 2007 (UTC)
- prepatory → preparatory (99) - added to list & run through. Rjwilmsi 17:26, 15 August 2007 (UTC)
- mountian → mountain (84) - added to list & run through. Rjwilmsi 17:26, 15 August 2007 (UTC)
- abilites → abilities (77) - added to list & run through. Rjwilmsi 17:26, 15 August 2007 (UTC)
- replacment → replacement (72) - run through. Rjwilmsi 17:26, 15 August 2007 (UTC), added it now Voorlandt 19:19, 16 August 2007 (UTC)
- pricipal → principal (65) - added to list & run through. Rjwilmsi 17:26, 15 August 2007 (UTC)
- protrayed → portrayed (65) - added to list & run through. Rjwilmsi 21:39, 12 August 2007 (UTC)
- infinate → infinite (55) - done Rjwilmsi 22:05, 12 August 2007 (UTC)
- personna → persona (52) - done Rjwilmsi 19:31, 17 August 2007 (UTC)
- newstands → newsstands (47) - done Rjwilmsi 19:31, 17 August 2007 (UTC)
- protray → portray (40) - added to list & run through. Rjwilmsi 21:39, 12 August 2007 (UTC)
- jeapordy → jeopardy (36) none to fix Rjwilmsi 19:31, 17 August 2007 (UTC)
- nobilty → nobility (31) - done. Rjwilmsi 13:30, 8 September 2007 (UTC)
- includeing → including (31) - done. Rjwilmsi 13:30, 8 September 2007 (UTC)
- minsitry → ministry (24) - done & added to list. Rjwilmsi 13:30, 8 September 2007 (UTC)
- unsheath → unsheathe (23) - done. Rjwilmsi 13:30, 8 September 2007 (UTC)
- oppenent → opponent (19) - done & added to list. Rjwilmsi 13:30, 8 September 2007 (UTC)
- wherupon → whereupon (18) - done & added to list. Rjwilmsi 13:30, 8 September 2007 (UTC)
- precipation → precipitation (18) - done. Rjwilmsi 13:30, 8 September 2007 (UTC)
- reliquish → relinquish (15) - done. Rjwilmsi 13:30, 8 September 2007 (UTC)
- valiently → valiantly (10) - done. Rjwilmsi 13:30, 8 September 2007 (UTC)
I might try my luck on regex, otherwise could someone please add the most important ones? If you want to test AWB, this list is also on User:Voorlandt/Sandbox Voorlandt 08:24, 7 August 2007 (UTC)
- Thanks for pointing these out, I'll work through them over the next couple of days. You can see how many I fix by looking at my contributions. Thanks Rjwilmsi 21:39, 12 August 2007 (UTC)
- Thanks a lot for this, I tried my luck on one (as you can see in the history), but I got discouraged since it didnt work when i ran AWB through my sandbox (it contains this list) and nothing showed up. Now I tried it again, with your additions to the Regex, but it still doesnt detect any of these typos. Could you try it on my sandbox to see if it works on your end? Voorlandt 22:04, 12 August 2007 (UTC)
- It works now, maybe it was a cache issue? (AWB still using the old regexes?) Voorlandt 07:22, 13 August 2007 (UTC)
- Thanks a lot for this, I tried my luck on one (as you can see in the history), but I got discouraged since it didnt work when i ran AWB through my sandbox (it contains this list) and nothing showed up. Now I tried it again, with your additions to the Regex, but it still doesnt detect any of these typos. Could you try it on my sandbox to see if it works on your end? Voorlandt 22:04, 12 August 2007 (UTC)
[edit] anerobic > anaerobic
This is not a common misspeelin but it annoys me because it MUST be right for the pedantic science types... -- Alan Liefting talk 06:26, 25 August 2007 (UTC)
[edit] Seperate -> separate (vs separte)
awb tried to change it to separte instead of separate --dputig07 20:26, 29 August 2007 (UTC)
[edit] Łódź -> Lodz
I know how AWB works, although I rarely use it anymore. But I'm expanding four articles in relation to the TV show Carnivàle where a (major?) character is named Lodz. And each time a wikipedian comes by with AWB, he replaces Lodz with the town name Łódź, which has to be undone by hand in order to not revert the real typo fixes. So I'd like to either suggest removing
<Typo word="Łódź" find="\bLodz\b" replace="Łódź" />
from Wikipedia:AutoWikiBrowser/Typos, or (if it's possible) ask whether Carnivàle, Avatars (Carnivàle), Characters of Carnivàle and List of Carnivàle episodes can be excluded from typo-autofixing (for this word). Thank you. – sgeureka t•c 16:49, 4 September 2007 (UTC)
- Its only really a list of pages where no typo fixing should happen at all. The general way to do it, is to remove that line from the typo fixing. Reedy Boy 17:07, 4 September 2007 (UTC)
- I don't understand (or I'm not sure that I understand correctly). Just remove "<Typo word="Łódź" find="\bLodz\b" replace="Łódź" />" from Wikipedia:AutoWikiBrowser/Typos, or what did you mean? I would prefer if someone else does what needs to be done and just lets me know that the Carnivàle articles will no longer be bothered by "Łódź". :-) – sgeureka t•c 17:48, 4 September 2007 (UTC)
[edit] Use XML instead of 'pre' for typo list markup?
I noticed that on the French RETF list, they use <source lang="xml"> and </source> instead of <pre> and </pre> and I think the colour markup looks better, and is maybe helpful for reviewing the regex. Opinions? Rjwilmsi 09:32, 9 September 2007 (UTC)
- I agree, and i've changed it as such. Good thinking! Reedy Boy 11:11, 9 September 2007 (UTC)
[edit] episiode (episoide) ->episode
current regex doesn't account for these 2 misspellings dputig07 18:26, 16 September 2007 (UTC)
- Added.--BillFlis 12:00, 20 September 2007 (UTC)
[edit] acessdate and accesdate
Should be accessdate. These 2 words are commonly misspelled when creating references using the 'web cite' template.[7] [8]. Could someone help me go through these. Maybe by using some sort of bot? MahangaTalk 02:33, 20 September 2007 (UTC)
- OK, I added a rule here so that AWB will make those corrections.--BillFlis 11:57, 20 September 2007 (UTC)
[edit] march
The changing of 'march' to 'March' is problematic. 'march' is also a verb. ssepp(talk) 00:22, 21 September 2007 (UTC)
- Is it? The match uses numbers to try to distinguish the verb from the month. Did it generate a false positive? -- JHunterJ 11:03, 27 September 2007 (UTC)
[edit] Referer
Perhaps referer->referrer should be removed per HTTP referer: Referer is a common misspelling of the word referrer. It is so common, in fact, that it made it into the official specification of HTTP – the communication protocol of the World Wide Web – and has therefore become the standard industry spelling when discussing HTTP referers. ssepp(talk) 00:25, 23 September 2007 (UTC)
- I think the correction should stay as 'referer' is a typo in all other situations - Merriam Webster doesn't list it. Rjwilmsi 17:24, 26 September 2007 (UTC)
- Removed the "referer" match. I don't think a regexp to determine which situation we're in is likely. -- JHunterJ 11:03, 27 September 2007 (UTC)
- Okay, but 'referer' is still 'corrected' by the "(Re/De/In/Trans/Con/Pre)ferred" rule. Rjwilmsi 17:52, 28 September 2007 (UTC)
- Aha. Fixed too. -- JHunterJ 21:12, 28 September 2007 (UTC)
[edit] Catalog(u)ing
Interesting fact: Cataloging ('incorrect') gets 10 million google hits, while cataloguing ('correct') gets 4 million google hits. Do we still consider it a spelling error if it has this widespread usage? ssepp(talk) 16:12, 26 September 2007 (UTC)
- While Merriam Webster accepts 'cataloged' and 'catalogued', only 'dialogued' is accepted, so I vote we leave the correction as it is, since it correctly fixes other variants. Rjwilmsi 17:21, 26 September 2007 (UTC)
- Fixed by splitting, so other variants will be handled correctly still. (Note there is no voting -- false positives are false positives and are to be removed regardless.) -- JHunterJ 11:03, 27 September 2007 (UTC)
[edit] useable -> usable
I'm not comfortable enough to change this. Thanks Yngvarr (t) (c) 20:23, 27 September 2007 (UTC)
[edit] How to use
Might be stupid but how to use Regex? Is there something to do in AWB? Thanks! --Bombastus 19:55, 3 October 2007 (UTC)
- In the 'set options' menu in the bottom panel, check "enable regextypofix". ssepp(talk) 21:13, 4 October 2007 (UTC)
[edit] Vigourously?
It currently changes: vigourously → vigorously I'm from the U.S., so I'm not sure, but isn't this the British spelling? Rocket000 22:54, 5 October 2007 (UTC)
- Does it? Sample diff? The "vigorous" entry doesn't seem to make that sub, but perhaps one of the other rules does. -- JHunterJ 23:36, 5 October 2007 (UTC)
- Well, I ran into this but I didn't save the changes, so there's no diff. I can produce one if you want. Rocket000 23:42, 5 October 2007 (UTC)
[edit] rarified → rarefied
Sample diff Isn't this an acceptable variant? [10] Rocket000 23:51, 5 October 2007 (UTC)
- I removed the entry based on that. Thanks! -- JHunterJ 02:47, 6 October 2007 (UTC)
[edit] Abysinnian
can something like this be added?
Abysinnian → Abyssinian
Thanks. Rocket000 09:55, 7 October 2007 (UTC)
[edit] comfirmed → confirmed
Example. Can someone please add this, thanks. --Closedmouth 04:37, 9 October 2007 (UTC)
- Added "Conf(i/o)rm".--BillFlis 10:25, 9 October 2007 (UTC)
[edit] Proffesor → Professor
and Proffesor → Professor, Profesor → Professor Please add this misspellings. Tirkfl 11:50, 12 October 2007 (UTC)
- There seem to be some legitimate occurrences with one S, e.g., El Profesor Hippie.--BillFlis 18:23, 12 October 2007 (UTC)
[edit] Cristian → Christian (again)
Please can Cristian be removed as typo for Christian as there a numerous false positives because it is a given name and there are also places of this name. See the prefix search for pages beginning with Cristian. I would remove it myself but the regex it appears to stem from looks complicated and makes many other (apparently valid) corrections. Thanks, mattbr 13:59, 13 October 2007 (UTC)
- I removed this.--BillFlis 12:22, 15 October 2007 (UTC)
[edit] -fuly -> -fully
I did a dictionary search and couldn't find any words that end in "fuly". How about adding a general rule something like this? find="fuly\b" replace="fully"--Thiseye 19:28, 14 October 2007 (UTC)
- I added this rule. "Usefully" is now just a special case.--BillFlis 12:23, 15 October 2007 (UTC)
[edit] Illegible superscripts.
Would be possible to delete the automatic reformatting of superscripts, for example the replacement of 3 by a tiny illegible 3. See for example the superscript in the AWB edit of Lanthanide contraction at 14:30 27 October 2007, which I have just reverted manually to 3. Yes, the source code is now longer, but the article is legible which is more important. Another alternative would be to have the AWB change to a much larger 3.
And similarly for 2 which I have seen replaced by a tiny illegible 2. Dirac66 20:07, 27 October 2007 (UTC)
- I can't seem to find the rule here that you're talking about. As I understand it, AWB users can also write their own personal editing rules, which they then apply not entirely automatically, as they have to okay every change that AWB makes. You might want complain to the AWB user who made the edits in question. But I'm curious: how did you know it was a "3" if it was illegible?--BillFlis 09:26, 28 October 2007 (UTC)
Thanks. I haven't used AWB myself so I thought the edit was due to application of an automatic rule. Since you find no such rule, I'll leave a note on the talk page of the user who made the edit. As for identifying the illegible "3", the previous text (before the edit) had a clearly legible 3 superscript so I assumed the editor must have inserted a 3 also, and I also checked by increasing my screen text size to maximum so that the 3 became (barely) legible. However readers who are not editors should be able to read the article without either checking previous edits or increasing screen text size. Dirac66 17:59, 28 October 2007 (UTC)
- See - Wikipedia_talk:AutoWikiBrowser#Superscript Reedy Boy 21:15, 28 October 2007 (UTC)
[edit] CalTrans → Caltrans; also Caltrain
I see a lot of this, not just in Wikipedia, but on public agency websites that work with Caltrans. The confusion may also be due to the logo being just a "c" and a "t". The California Department of Transportation writes its abbreviated name with a lowercase "t".
In the wake of the frenetic 1960s, the 1970s were a time of austerity. The then-current political philosophy urged alternatives to highway building, a trend that would continue into the 1980s. Such thinking led to a new name for the department, Caltrans, short for the California Department of Transportation. The name change was emblematic of new thinking, and a rise in the concept that while highways have long been vital to the state, other forms of transportation were emerging to complement roadways.
On a different note, there may also be confusion between Caltrans and Caltrain. Oh, and another note: CalTrain was actually an official old name for Caltrain. --Geopgeop (T) 11:55, 30 October 2007 (UTC)
[edit] critcism → criticism
A while back, I fixed a case where criticism was spelled with the 2nd I omitted (critcism). I just searched Wikipedia, and I found at least 8 other non-talk pages that appear to still be uncorrrected.
<Typo word="Criticism" find="\b(C|c)ritisi[sz]?(ms?|e[ds]?|ing)\b" replace="$1riticis$2" />
It appears the current rule for criticism does not correct this spelling error, so I'd like to suggest that this be added to the list. --Smiller933 21:00, 31 October 2007 (UTC)
- BillFlis kindly added this correction last week and I've fixed all mainspace articles with this error. Thanks Rjwilmsi 21:57, 11 November 2007 (UTC)
[edit] payed → paid
Payed is an obselete spelling. In non-quote situations "paid" should be used. Mbisanz 15:54, 8 November 2007 (UTC)
- Wiktionary agrees that 'payed' is obsolete as you say, but to include this correction would introduce a lot of false positives, so I don't think we should do it. Thanks Rjwilmsi 22:02, 11 November 2007 (UTC)
- My American Heritage Dictionary says that "payed" is an acceptable spelling (not obsolete) for the past of "paying" out a line (rope). Also, here's a link at Merriam Webster that says "payed" is OK.--BillFlis 00:18, 12 November 2007 (UTC)
- Here is a partial OED extract "Past tense and past participle paid, (chiefly in nautical senses) payed. " So unless its being used in a nautical sense, paid would be the appropirate usage. Maybe something better for be to do by hand rather than building into the spellchecker. Mbisanz 22:16, 13 November 2007 (UTC)
- My American Heritage Dictionary says that "payed" is an acceptable spelling (not obsolete) for the past of "paying" out a line (rope). Also, here's a link at Merriam Webster that says "payed" is OK.--BillFlis 00:18, 12 November 2007 (UTC)
[edit] A few suggestions for inclusion
Here's some misspellings I ran across that AWB missed:
- mimicing → mimicking
- catholic → Catholic
- anglophone → Anglophone
- parmesan → Parmesan
Sorry I don't trust my regex skills. -Rocket000 00:47, 12 November 2007 (UTC)
- I say no to anglophone and catholic, Merriam-Webster accepts them as both lower and uppercase - anglophone and catholic. I've added the other two. Thanks Rjwilmsi 07:55, 12 November 2007 (UTC)
[edit] Why doesn't this work?
Anyone have an idea why this doesn't work?
<Typo word="Triplets" find="([aeiou])([bdfgklmnprstvz])\2\2+(ed|[eo]rs?|ings?)\b" replace="$1$2$2$3" />
It's supposed to fix triple letter errors like "lettter" and "errrors" but it seems to match nothing. Is the \2 backreference feature not supported? It works fine if I plug it into the standard "Find and replace" of AWB. —Wknight94 (talk) 05:12, 13 November 2007 (UTC)
- Maybe the backslash is getting "eaten"? Try \\2\\2 there. -- JHunterJ 00:26, 14 November 2007 (UTC)
[edit] Some to be added
[Cc]incinati → Cincinnati
cincinnati → Cincinnati
[Cc]inncinati → Cincinnati
Thanks, jj137 (Talk) 02:06, 22 November 2007 (UTC)
[edit] British
Could someone look over the British entries. I got a weird error that resulted in a spelling of Britiish being entered from what I think was Brititish. Mbisanz (talk) 06:57, 25 November 2007 (UTC)
- I looked over "Britain" and "British", and they seem correct. Are you sure it was one of these? Or could it have been some other rule?--BillFlis (talk) 12:12, 25 November 2007 (UTC)
[edit] Enmore
Please stop Enmore → Emmore. As a locality the spelling is correct, eg Enmore, New South Wales. Many thanks. --Breno talk 13:34, 27 November 2007 (UTC)
[edit] AWB's RegexTypoFix on other wiki(pedia)s
Is it possible to make something similar to use on another wiki(pedia)? And to choose language in AWB? 20:54, 28 November 2007 (UTC) - pl:user:Matma Rex
[edit] AWB Typo Profiling
Hi Guys, Just a heads up, and a pointer for some reworking for you - Wikipedia:AutoWikiBrowser/Typos/Profiling
MaxSem has added a typo profiler to AWB in debug.
The time, on the left hand side, in miliseconds, is the time for the runs over the page text, and therefore the time taken to "run" the typo fixing...
If people could work on reducing some of the larger times, it'll help speed up AWB's operation, and the inital page processing time.
MaxSem would probably be able to answer any more in-detail questions....
—Reedy Boy 16:23, 4 December 2007 (UTC)
- Interesting and useful. I have scanned a recent database dump for instances of the most time consuming check which is "(A/Air/In/...)field". I found only four instances which I have since fixed. I propose to simplify the search, removing some of the more obscure cases for the sake of efficiency. Gaius Cornelius (talk) 09:11, 8 December 2007 (UTC)
- That rule has been around for a while, which is surely why you found only four instances left. If we divide that long rule into several shorter rules, they'll each probably only take as much time as other, shorter rules. But is that what we really want to do? Anyway, I want ahead and shortened the rule and eliminated a few rare words (proper names).--BillFlis (talk) 14:31, 8 December 2007 (UTC)
- Well the output in general puzzles me. Why is the "field" expression more than a hundred times longer than [36, \blatin(|[ao]s?|ate|is[mt]s?|i[sz](e[sd]?|ing))\b > Latin$1]? Are the (aaa|bbb) constructs really so expensive? If so, we need to rethink the overall approach it seems. —Wknight94 (talk) 16:12, 8 December 2007 (UTC)
- That rule has been around for a while, which is surely why you found only four instances left. If we divide that long rule into several shorter rules, they'll each probably only take as much time as other, shorter rules. But is that what we really want to do? Anyway, I want ahead and shortened the rule and eliminated a few rare words (proper names).--BillFlis (talk) 14:31, 8 December 2007 (UTC)
[edit] Misspellings in actual quotations shouldn't be corrected
I'm not sure if this is the proper place to ask about this, but I've noticed that typos have been corrected with AWB on the Mitchell Map, but the "typos" are direct quotes from the map, odd spellings and all. It seems like a case where one wants the words spelled "wrong" and not fixed. So I wonder if there is an easy way to mark text like this so that it doesn't get "fixed"? Thanks. Pfly (talk) 23:28, 8 December 2007 (UTC)
- You could cheat by using tags like {{lang|fr|the word}} to identify the text as not modern English - see List_of_ISO_639-2_codes, or add a [sic] (even as a comment) to remind users, or simply remind the user who made the edit to take a little more care. Thanks Rjwilmsi (talk) 19:49, 12 December 2007 (UTC)
- And that works because things inside templates are not spell-checked, right? I'd like to get away from that though and have only things marked specifically as {{do not spellcheck}} not be spell-checked. I find myself also putting regexes into the find-and-replace section so templated areas are caught as well. —Wknight94 (talk) 20:08, 12 December 2007 (UTC)
[edit] Cret
So in the article Pennine Alps, it ids "Cret" as it should be changed to "Correct". Isn't this a stretch. Mbisanz (talk) 19:50, 9 December 2007 (UTC)
[edit] Question
Is there a way to have certain articles excluded from specific spelling corrections? Earlier today, an AWB user edited the article on Robert Cliche to correct Cliche to "Cliché", per the inclusion of that here as a common spelling error. However, the politician's surname was definitively Cliche (cleesh) rather than "Cliché", so I had to revert it. Is there an exclusions list that I can have this article added to, or a code I can insert into the article to flag the typo bot to skip this article when looking for "cliche" → "cliché" corrections? Bearcat (talk) 01:55, 10 December 2007 (UTC)
- Yo. Anybody home? Another alternative, if possible, would be to have the typo bot skip "cliche → cliché" if the article contains the phrases Robert Cliche, Cliche Commission or Robert-Cliche Regional County Municipality. Bearcat (talk) 23:39, 11 December 2007 (UTC)
- Wikipedia talk:AutoWikiBrowser/Dev may be a better venue to bring this up. It seems we need some support for excluding certain words from spell checking. Perhaps by wrapping such words in some template? —Wknight94 (talk) 12:12, 12 December 2007 (UTC)
[edit] unspayed
"unspayed" is being changed to "unpaid" Mbisanz (talk) 04:01, 10 December 2007 (UTC)
[edit] Steps
AWB is suggesting that words in the form step-x be corrected to stepx as in step-son becoming stepson. Another user mentioned that this really isn't a misspelling or even a poor usage of the word. Could we pull it out of the regex? Mbisanz (talk) 09:42, 12 December 2007 (UTC)
- I'd support removing it as a legitimate variant spelling. However, I would keep grand-father→grandfather etc in. Inconsistent I know... — iridescent 00:13, 15 December 2007 (UTC)
- I need to weigh in on the step issue and thank Iridescent for leading me here. All my life my step-bro was my step (hyphen) brother, not stepbrother. I agree the grandX should remain granddaughter, grandmother, et al, but the stepX/step-X should not be part of AWB "bad words" and I'm glad to see others have noticed. KellyAna (talk) 04:48, 15 December 2007 (UTC)
[edit] referenses -> references
Can someone do it? I have difficulties to just corrct the "Refer" entry. -- Magioladitis (talk) 14:20, 14 December 2007 (UTC)
- Additionally, it is now catching the correct versions, i.e., it will try to turn "reference" into "reference" which is a waste of time. We try to avoid that. I'll look into catching your new case. —Wknight94 (talk) 14:42, 14 December 2007 (UTC)
[edit] destroied -> destroyed
In this diff here [12] , I had to manually correct AWB's output to correct the error. Can this be built into regex? Mbisanz (talk) 07:08, 17 December 2007 (UTC)
[edit] disicplined -> dissicplined?
AWB recommended this change on this page. It looks like the same change was made on this diff. Both appear to be a typo of disciplined. KathrynLybarger (talk) 04:00, 21 December 2007 (UTC)
- I expanded the Discipline rule to catch this. Since it wasn't being covered there, it was falling through to the Diss- beginning rule. —Wknight94 (talk) 04:20, 21 December 2007 (UTC)
[edit] appriciated → apppreciated
The RegexTypoFix suggests the fix: appriciated → apppreciated on Jungle Run. Could someone correct the regex? BTW, I'd like the RegexTypoFix to be able to fix loosly → loosely and makeing → making too. Thanks, Warut (talk) 11:41, 21 December 2007 (UTC)
[edit] uber →
I am seeing this stolen prefix used frequently without the umlaut. When transLITERATED into English, the German "ü" must be transcribed to "ue".
Transliterations should follow accepted rules such as those established by the M.L.A., Chicago, etc., regardless of a user's knowledge of German (in this case), transliteration, or the history of a word's origins.
I suggest using the ü, as opposed to the ue transliteration, which I think might further confuse the issue to those unfamiliar with transliterating rules. 76.180.174.222 (talk) 13:23, 26 December 2007 (UTC)
- In American English at least, the prefixes uber- and über- are both acceptable. I think ueber- would confuse many readers. (I have never seen the latter used in English.) Of course, if German language is being quoted, the umlaut should just be used here; no transliteration is necessary. And nobody has "stolen" anything, English merely borrowed it; AFAIK, Germans still use "über"and "über-"!--BillFlis (talk) 13:52, 26 December 2007 (UTC)
[edit] More requests
- (in)definately → (in)definitely
- fianlly → finally
- Lousiana → Louisiana
Thanks, Warut (talk) 11:31, 28 December 2007 (UTC)
- Definately was already done. I added Fianlly and Lousiana. —Wknight94 (talk) 12:23, 28 December 2007 (UTC)
-
- Now I know why I asked for indefinately: AWB cannot detect indefinately in List of General Hospital characters. But I don't understand why it can't. Warut (talk) 18:09, 28 December 2007 (UTC)
- AWB is a bit odd in what it will and won't fix. It won't fix typos inside links - internal or external. Maybe it won't fix yours because it's indented. You can test it out by adding it into the find-and-replace portion of AWB (and check regular expressions on). If it changes it after you do that, then it is in a section that is being excluded for some reason. "Indefinately" is definitely () in the typo list because I tried it this morning. —Wknight94 (talk) 19:22, 28 December 2007 (UTC)
- Now I know why I asked for indefinately: AWB cannot detect indefinately in List of General Hospital characters. But I don't understand why it can't. Warut (talk) 18:09, 28 December 2007 (UTC)
[edit] "Musicial" to "musical"
I suggest changing all uses of "musicial" to "musical" -- I found 17 such misspellings on Wikipedia. Thanks, --Skb8721 (talk) 23:14, 3 January 2008 (UTC)
[edit] "march" to "March"
Per this diff here [13] the word march as in a parade is being picked up as the month. Can this be fixed. MBisanz talk 21:57, 9 January 2008 (UTC)
[edit] Teh is very difficult to fix
Many people and things have the word "teh" in them; it appears both uppercase and lowercase. Is there a way to filter out these "legitimate" uses of "teh"? -- King of ♥ ♦ ♣ ♠ 06:07, 14 January 2008 (UTC)
- Yea, its a problem, but when I've done googles of wikipedia, there are so many valid chemical uses of teh and the whole article on teh as a different spelling. Don't think its possible. Onthe other hand, you could write a custom replacement for teh->the for individual use. MBisanz talk 19:23, 14 January 2008 (UTC)
[edit] Critised
Sorry I can't add these myself. I'm not good with regexes and wouldn't want to screw things up.
Currently: critised → criticized
Could it go to the Commonwealth English not US English "criticised". Websters 1996 [14] —Preceding unsigned comment added by Breno (talk • contribs) 07:22, 15 January 2008 (UTC)
[edit] Arctic -> arctic
Perhaps Arctic should not be changed to artic, since arctic as an adjective is lowercase according to wikt:arctic. Arthena(talk) 18:26, 16 January 2008 (UTC)
[edit] correspondance -> correspondence
The rule correspondance -> correspondence gives false positives because correspondance is French for correspondence, and the French word is used in many articles. A search for correspondance [15] shows many legitimate uses of the word. Arthena(talk) 18:26, 16 January 2008 (UTC)
[edit] Typo lists regarding American and British English
American and British English spelling differences#Compounds and hyphens should probably be cross referenced on our typo list. I was recently noticing (and a fellow editor also pointed out) that words like extracurricular is being marked as wrong if spelled as extra-curricular, which is correct for British English (extends to Australia, Hong Kong, India, and sometimes Philippines. Is there a way we can do this? - Jameson L. Tai talk ♦ contribs 05:10, 19 January 2008 (UTC)
- Would be better placed here. The "typo moderators" are more likely to seeit now! —Reedy Boy 10:31, 19 January 2008 (UTC)
- Thanks for redirecting me here. :-) - Jameson L. Tai talk ♦ contribs 08:01, 20 January 2008 (UTC)
[edit] Louisianian → Louisianan
I've noticed that RegexTypoFix always change Louisianian → Louisianan. However, both Louisianian and Louisianan are valid according to the list of U.S. state residents names. So this may need a fix. Thanks, Warut (talk) 11:58, 20 January 2008 (UTC)
- What's more, this dictionary lists both spellings.--BillFlis (talk) 16:28, 21 January 2008 (UTC)
[edit] The negative lookbehinds used in the regex lists
We've got four instances of the use of negative lookbehinds (a ?<! to exlcude a string) in the regex typo list. wikEd now uses the list directly, and these can't be supported by wikEd as JavaScript, which wikEd uses, apparently doesn't support lookbehinds. I've tried and failed to find replacement regex for these, can anybody else come up with one? Cacycle has commented the four occurrences with // "invalid quantifier" JS error:. I'll ask him/her for help too. Thanks Rjwilmsi 21:56, 7 September 2007 (UTC)
- There's no regular expression equivalent for zero-width look-behind assertions. Commenting them out means that AWB can use them either, it appears. Can we uncomment them now to restore functionality to AWB? Or invoke some alternate "tagging" so that the wikEd program can recognize them and ignore them, or AWB can recognize and include them? -- JHunterJ (talk) 00:48, 5 February 2008 (UTC)
[edit] + emplyed -> employed
Can someone please add this? Example. Thanks. --Closedmouth (talk) 08:13, 28 January 2008 (UTC)
[edit] runnning -> rngnning
This edit is a rather odd mistake that I didn't notice at the time. I'm not good enough at reading regex to figure out what caused it, could someone look into it?--Dycedarg ж 21:18, 28 January 2008 (UTC)
- Hmmm, I've seen something similar with a prior version of AWB. Has to do with all regexes being auto surrounded by parentheses in the code. Backreference numbering gets screwed up. I changed it to use named backreferences and now things appear to work. Sorry for the confusion. —Wknight94 (talk) 21:58, 28 January 2008 (UTC)
[edit] summery -> summary
The summary fix is incorrectly changing summery to summary. I don't see easily how to change it without dropping the fix for 'sumary'. Ideas? Thanks Rjwilmsi (talk) 22:58, 4 February 2008 (UTC)
- I may be misunderstanding your concern... how about this? —Wknight94 (talk) 23:04, 4 February 2008 (UTC)
- Hmm, my point was that that regex now misses words matching [Ss]ummer(i[sz](ation|e[ds]?|ing) (i.e. all endings except just y to make summery) which we would want to correct to summar$1 etc. Ideas? Rjwilmsi (talk) 19:59, 6 February 2008 (UTC)
- Actually I think the followup edit to mine accomplished more of what you had in mind. Catches all summerxxx words except summery. (Or I assume it does - I didn't actually try it). —Wknight94 (talk) 20:10, 6 February 2008 (UTC)
- Hmm, my point was that that regex now misses words matching [Ss]ummer(i[sz](ation|e[ds]?|ing) (i.e. all endings except just y to make summery) which we would want to correct to summar$1 etc. Ideas? Rjwilmsi (talk) 19:59, 6 February 2008 (UTC)
[edit] interupt -> interrupt
Lemon Interupt is the alternate name of this band. From a search of google it appears that the name is used in only seven different articles. Is it worth removing the typo from the list or modifying it somehow?--Dycedarg ж 11:44, 20 February 2008 (UTC)
- I added a lookbehind assertion to allow Lemon Interupt (and Lemon Interupts, for that matter, but I don't think that's a big problem). -- JHunterJ (talk) 12:24, 20 February 2008 (UTC)
[edit] Broken regex
I'm being told regex is broken and I don't know how to fix it. MBisanz talk 02:57, 29 February 2008 (UTC)
[edit] Looks like AWB replaces "manouvers" with "manoeuvers".
Aren't later one is a misspelling? Like should not it be "manoeuvres"? I would say "manouvers" should be replaced with "maneuvers" and "manoeuvers" with "manoeuvres" ... But I could be wrong cuz English is a second language for me. TestPilot 11:02, 10 March 2008 (UTC)
[edit] Imtrec Aviation -> Intrec Aviation
Can someone add as exception? Imtrec Aviation is a legitimate company. Should "imtrec"->"intrec" rule be kept at all? TestPilot 15:02, 10 March 2008 (UTC)
- Looks like this one got fixed by User:BillFlis. Thanx. TestPilot 07:05, 11 March 2008 (UTC)
[edit] Imdadkhani
The script wants to replace perfectly good "Imdadkhani"(28 pages in WP) with nonexistent "Indadkhani" for some reason. TestPilot 16:24, 11 March 2008 (UTC)
[edit] Retuned
It is not a valid word - it is a misspelling of returned. TestPilot 17:57, 11 March 2008 (UTC)
- Re+tuned, see the usage[16]. MaxSem(Han shot first!) 18:03, 11 March 2008 (UTC)
- Opps. Yes. Correct, sorry. TestPilot 18:05, 11 March 2008 (UTC)
[edit] in so far → insofar?
Looks like "in so far" is a legitimate spelling. Should we really replace it? TestPilot 14:05, 10 March 2008 (UTC)
- I can see a lot of false positives with that. Also [17]. Rocket000 (talk) 23:06, 15 March 2008 (UTC)
[edit] AutoCorrect database
I have created a page with huge list of typo corrections from AutoCorrect software. RegExTypoFix got covered lots of entries, but far from all. The list itself was originally based on old list of wiki typo corrections. And it was created by AHK community. The easiest way to check it out in AWB is to create list from "what links here" - Zelavin article. Make sure you enable user space pages. Second, today I started to work on my own utility for typo autocorrection on the fly. It sort of working already, as I type:), and the good news is that it checks against 2200 regexpressions (all that was on AutoWikiBrowser/Typos page) in a blink of an eye. Even faster then that - on relatively old computer. So it do looks like we can expand regex list like tenfold without having to worry too much about performance. TestPilot 02:24, 13 March 2008 (UTC)
- I cleaned out list and updated typo page with new rules. TestPilot 03:58, 14 March 2008 (UTC)
[edit] heavly → heavily
In this edit, it somehow used avly → avely. Can this be fixed? Thanks. — E talk 23:44, 20 March 2008 (UTC)
- Removed. MaxSem(Han shot first!) 13:03, 23 March 2008 (UTC)
[edit] Thru -> through
Given the number of legitimate uses (including in article titles - see Special:Prefixindex/Thru), should this be an automatic correction? Black Falcon (Talk) 22:41, 22 March 2008 (UTC)
[edit] Inbhir -> Imbhir
I'm not sure which line is causing the change, but I think there are too many false positive associated with this change of "In" to "Im". Examples of articles on which this would cause errors include Ayr and Cullen. – Black Falcon (Talk) 16:49, 24 March 2008 (UTC)
- A similar issue takes place with replacement of "En" with "Em" (e.g. "Enman" -> "Emman", in the article William George Barker). Black Falcon (Talk) 21:36, 24 March 2008 (UTC)
The Ayr and Cullen articles were incorrectly tagged. I've fixed them [18] and [19]. Thanks Rjwilmsi (talk) 00:07, 30 March 2008 (UTC)
I've added an exception so 'Enman' isn't caught - [20]. Rjwilmsi (talk) 00:13, 30 March 2008 (UTC)
- Thanks. Black Falcon (Talk) 06:47, 6 April 2008 (UTC)
[edit] Consitution > Constitution
Hm? Jobjörn (talk) 14:42, 7 April 2008 (UTC)