Wikipedia talk:AutoWikiBrowser/Typos

From Wikipedia, the free encyclopedia

AutoWikiBrowser - v3.3.1.0

Home

General information about AutoWikiBrowser and directions for installation.

Request approval

Request approval to be added to the CheckPage to use AutoWikiBrowser.

Discussion

Discuss the application and ask questions.

Bugs

Report bugs in the application.

Feature Requests

Request new features to be implemented into AWB

User manual

The full user manual.

Developer Talk Page · Typos · User talk templates · Plugins · IRCMonitor · AWB Projects · Changelog · AWB Sandbox
Archive
Archive

Contents

[edit] Misspellings to be added

Should new misspellings go here or in the "Misspellings to be Added" section of the main project page? Regardless, here's about 90 that I've amassed. I'd add them myself, but some of those regexes are pretty complex and scare me. I've verified that all these aren't acceptable by dictionary.com and that there are at least 10 instances of each in Wikipedia. False positives haven't been checked for, however. And there are probably prefixes/suffixes that can be added to most of them.

(Can someone please add some of these? --Thiseye 07:02, 2 March 2007 (UTC))

  • committe → committee
  • comsumption → consumption
  • confict → conflict
  • controvesy → controversy
  • depatment → department
  • detemine → determine
  • differenciate → differentiate
  • elligible → eligible
  • erronous → erroneous
  • girfriend → girlfriend
  • helicoptor → helicopter
  • highten → heighten - Updated Height Reedy Boy 21:24, 24 March 2007 (UTC)
    • Not correct. This now handles "highthen" (after I fixed it), but not "highten". --Thiseye 23:34, 24 March 2007 (UTC)
  • immedately → immediately
  • immensly → immensely
  • inpenetrable → impenetrable
  • intitution → institution
  • itslef → itself
  • jeapordy → jeopardy
  • likley → likely
  • liqour → liquor
  • literaly → literally
  • minsitry → ministry
  • mountian → mountain
  • newstands → newsstands
  • nobilty → nobility
  • oppenent → opponent
  • orginial → original
  • peform → perform
  • perfomance → performance
  • personna → persona
  • editted → edited
  • posibility → possibility
  • precip(a|ia)tion → precipitation
  • prepatory → preparatory
  • pricipal → principal
  • recruting → recruiting
  • reliquish → relinquish
  • reminicent → reminiscent
  • replacment → replacement
  • responed → responded
  • sectretary → secretary
  • signiture → signature
  • similarily → similarly
  • similiar → similar
  • unsheath → unsheathe
  • valiently → valiantly
  • wherupon → whereupon
  • wheter → whether
  • widly → widely

[edit] Added

  • (out)manoeuvered → outmaneuvered - Added Reedy Boy 21:31, 24 March 2007 (UTC)
  • (out)manuever(s) → outmaneuvers - Added Reedy Boy 21:31, 24 March 2007 (UTC)
  • Lousiana → Louisiana - Already Added Reedy Boy 21:24, 24 March 2007 (UTC)
  • catapault → catapult - Added Reedy Boy 20:52, 24 March 2007 (UTC)
  • centeral → central - Added Reedy Boy 20:52, 24 March 2007 (UTC)
  • (un)consitutional → unconstitutional
    • mostly handled by "constitute" entry. "un" portion is not handled --Thiseye 07:02, 2 March 2007 (UTC)
      • Added to "(Un)Constitute", etc.--BillFlis 13:30, 24 March 2007 (UTC)
  • barbituate → barbiturate
    • Added.--BillFlis 14:05, 24 March 2007 (UTC)
  • catastrophies → catastrophes
  • dissention → dissension
    • Added.--BillFlis 14:05, 24 March 2007 (UTC)
  • intergalatic → intergalactic
    • Added as "(Inter)Galactic".--BillFlis 13:45, 24 March 2007 (UTC)
  • negociations → negotiations
    • Added.--BillFlis 14:05, 24 March 2007 (UTC)
  • noticably → noticeably
    • Added.--BillFlis 14:05, 24 March 2007 (UTC)
  • weilds → wields
    • Added as "(W/Y)ield".--BillFlis 14:05, 24 March 2007 (UTC)
  • charcter → character
  • comandeer → commandeer
    • Added to "Command(eer/o)".--BillFlis 14:05, 24 March 2007 (UTC)
  • (I|i)nagura(tion|tions|te|ted|tes|l) → $1naugura$2
    • added --Thiseye 07:02, 2 March 2007 (UTC)
  • (O|o)ccaisio(n|ns|nal|nally) → $1ccasion$2
    • added --Thiseye 07:46, 10 March 2007 (UTC)
  • abscence → absence
    • added --Thiseye 07:46, 10 March 2007 (UTC)
  • accompained → accompanied
    • added --Thiseye 07:46, 10 March 2007 (UTC)
  • additionaly → additionally
  • Alburquerque → Albuquerque
  • aprove → approve
  • beated → beat
  • celing → ceiling
  • constitutent → constituent
    • added --Thiseye 04:48, 11 March 2007 (UTC)
  • constrast → contrast
    • added --Thiseye 07:02, 2 March 2007 (UTC)
  • situtation → situation
  • explination → explanation
    • added --Thiseye 04:29, 11 March 2007 (UTC)
  • platnium → platinum
    • added --Thiseye 04:29, 11 March 2007 (UTC)
  • politican → politician
    • added --Thiseye 04:29, 11 March 2007 (UTC)
  • responsbility → responsibility
    • added --Thiseye 04:29, 11 March 2007 (UTC)
  • restuarant → restaurant
    • added --Thiseye 04:29, 11 March 2007 (UTC)
  • rythm → rhythm
    • already handled --Thiseye 07:46, 10 March 2007 (UTC)
  • sattelite → satellite
    • already handled --Thiseye 07:46, 10 March 2007 (UTC)
  • thier → their
    • added --Thiseye 04:29, 11 March 2007 (UTC)
  • unaminous → unanimous
    • added --Thiseye 07:46, 10 March 2007 (UTC)
  • unprecented → unprecedented
  • Vancover → Vancouver
  • wieght → weight

Thiseye 16:09, 31 December 2006 (UTC)

[edit] Reliable sources

Is dictionary.com a reliable source?--Andeh 06:04, 11 August 2006 (UTC)

Nope. See here. alphaChimp laudare 06:19, 11 August 2006 (UTC)
OK, what about Microsoft Word 2000's or higher dictionary?--Andeh 06:25, 11 August 2006 (UTC)

This looks like a good source for misspellings: http://www.misspelled.com/common/a.htm --BillFlis 10:45, 27 August 2006 (UTC)

[edit] Full stops, commas, colons, brackets and double spaces

I have felt that following mistakes are too comon (specially in stubs) to ignore:

  • c denotes any alphanumeric character
  • s denotes a space character
Mistake Correction Suggested code
c.c c.sc
<Typofind="\b(a-zA-Z).(a-zA-Z)\b" replace="$1. $2" />
cs.c c.sc
<Typofind="\b(a-zA-Z) .(a-zA-Z)\b" replace="$1. $2" />
cs.sc c.sc
<Typofind="\b(a-zA-Z) . (a-zA-Z)\b" replace="$1. $2" />
c,c c,sc
<Typofind="\b(a-zA-Z),(a-zA-Z)\b" replace="$1, $2" />
cs,c c,sc
<Typofind="\b(a-zA-Z) ,(a-zA-Z)\b" replace="$1, $2" />
cs,sc c,sc
<Typofind="\b(a-zA-Z) , (a-zA-Z)\b" replace="$1, $2" />
c;c c;sc
<Typofind="\b(a-zA-Z);(a-zA-Z)\b" replace="$1; $2" />
cs;c c;sc
<Typofind="\b(a-zA-Z) ;(a-zA-Z)\b" replace="$1; $2" />
cs;sc c;sc
<Typofind="\b(a-zA-Z) ; (a-zA-Z)\b" replace="$1; $2" />
c(c cs(c And so forth
c(sc cs(c And so forth
cs(sc cs(c And so forth
c)c c)sc And so forth
cs)c c)sc And so forth
cs)sc c)sc And so forth
ss s And so forth

Note: Suggested code is based on my preliminary understanding of the pattern of the working code at Wikipedia:AutoWikiBrowser/Typos, and I am very sure it is wrong and needs to be corrected.

Szhaider 15:39, 9 October 2006 (UTC)

These are indeed common mistakes, but unfortunately, in my experience there are too many legitimate exceptions, such as ".NET", the other mistakes may not have so many exceptions though. Martin 16:16, 9 October 2006 (UTC)
Yeah, and what about U.S.A.? Or T.S. Eliot? Also, semi-colon is part of many HTML entities, like "—" etc., which will butt right up against letters.--BillFlis 02:11, 10 October 2006 (UTC)

[edit] facilitate

The new entry for facilitate is not correct. It's changing facilitate to facilitatli. I think it should have $3 instead of $2. --Thiseye 00:44, 1 March 2007 (UTC)

Thanks for reporting; fixed. -- intgr 00:47, 1 March 2007 (UTC)

[edit] secretarty -> secretary

found in Marita Ulvskog. Jobjörn (Talk ° contribs) 01:21, 8 March 2007 (UTC)

Added to existing "Secretary" entry.--BillFlis 22:33, 8 March 2007 (UTC)

[edit] RETF oddities

I noticed something strange that could be a bug in AWB. I've noticed in several articles that if a typo is in wiki tags [[]], then RETF will not catch this. I assumed this was because it's not excluding the brackets as part of the word so it wasn't matching the regex. But then I noticed in the Akshay Pratap Singh article, that the FAR does catch typos within wiki tags. In this article, "politican" is misspelled. I had a FAR entry to correct this which I recently added to RETF. However, I noticed when I disabled the FAR entry, it would no longer be corrected. I updated the FAR regex to exactly that of the RETF regex, and still FAR would correct it, but RETF would not. --Thiseye 22:43, 11 March 2007 (UTC)

I believe this has been discussed a few times over on the AWB talk pages, it has been setup like this purposely. There are reasons for doing it both ways, and i think we are looking into having it check more... Post it on the AWB talk page... Reedy Boy 17:55, 12 March 2007 (UTC)

[edit] Not sure if anyone will see this...

I was wondering if the AWB could include the often misused words "reoccur", "reoccured", and "reoccuring". These are not actual words (contrary to popular assumption)! They should all be changed to "recur", "recurred", and "recurring". Mahalo. --Ali'i 20:44, 13 March 2007 (UTC)

Oops, they already are included:

<Typo word="(Re(o)c/Re)currence" find="\b([Rr]eoc|[Oo]c|Re)curran(ces?|t|tly)\b" replace="$1curren$2" /> <Typo word="Recurr(ed/ing)" find="\b(R|r)ec(?:cur?|u)r(ed|ing|ent|ently)\b" replace="$1ecurr$2" />

Sorry about that. Thanks anyway. --Ali'i 20:47, 13 March 2007 (UTC)

[edit] Includeing -> Including

As above, suggest replacing includeing with including. Harryboyles 05:59, 17 March 2007 (UTC)

[edit] Asian needs to be updated

There is a misspelling in Kai Chen as asain, the current accounts for aisian....

[edit] Dependant vs. Dependent

It appears that "dependant" is acceptable in British English, esp. as a noun. If people concur, it should be removed from the typo list IMHO. —Wknight94 (talk) 15:21, 23 March 2007 (UTC)

It's not just British. An American dictionary http://www.m-w.com/dictionary/dependant lists it too.--BillFlis 18:10, 23 March 2007 (UTC)
So it should be removed, no? —Wknight94 (talk) 14:05, 24 March 2007 (UTC)
It definitely needs to be removed. As a noun a dependant is a person looked after by another e.g. a father's dependants are his children (sorry for the approximate definition). Dependant may well be incorrectly used e.g. 'dependant on the weather ...' but can't be fixed this way. Rjwilmsi 19:19, 26 March 2007 (UTC)
I removed it shortly after my last message. —Wknight94 (talk) 21:21, 26 March 2007 (UTC)

[edit] Regex/CPU question

I know that we want to reduce the number of regexes to reduce the amount of CPU time used to process them all. I'm assuming this means that there is little to no CPU cost associated with adding a variant to an existing regex compared to adding a completely new entry. Should we avoid adding variants to an existing regex that don't occur too often, or does that matter?

Also, it seems we avoid "catching" the correct spelling within the regex. Is that the standard we should go by? And to what extent should we go to avoid that situation? I've seen some regexes that do catch the correct spelling, so should I try to rework these, or is this sometimes acceptable ("available" is an example). Further, should we avoid trying to catch certain variants of typos to avoid catching the correct spelling? Should we avoid adding a new entry to try to catch a variant to avoid catching the correct spelling ("Vancouver" is an example)? --Thiseye 18:28, 25 March 2007 (UTC)

[edit] Combining regexes that catch missing "e" before "ly" suffix

I wanted to get some other's thoughts on combining several regexes (and incorporating some new ones). The thing is that if we want to add other variants to these, we'd probably want to separate them out again.

<Typo word="(Accurate/Active/Affectionate/Alternate/Appropriate/(Ab/Re)solute/Collective/Consecutive/Desperate/Exclusive/Extensive/False/Large/Separate/Severe)ly" find="\b((A|a)(ccurat|ctiv|ffectionat|lternat|ppropriat)|([Aa]b|[Rr]e)solut|(C|c)o(llec|nsecu)tiv|(D|d)esperat|(E|e)x(clu|ten)siv|(F|f)als|(L|l)arg|(S|s)e(parat|ver))ly\b" replace="$1ely" />

--Thiseye 00:01, 26 March 2007 (UTC)

I think this is a good idea, I have been using some regexes like this personally and they can work pretty well. Gaius Cornelius 00:05, 26 March 2007 (UTC)
Good idea, but I have a suggestion. No English words end in "ivly" or "avly". This:
<Typo word="-(a/i)vely" find="(a|i)vly\b" replace="$1vely" />
catches your "-ively" words and over a thousand more. I went ahead and added this and a few others under New Additions; I'll let them cook for a while to see if any unforeseen problems arise before deleting any existing entries.--BillFlis 10:29, 26 March 2007 (UTC)

[edit] 'infinate' fixed to 'infinit'

The typo correction ((In)De/In/Af)Finite fixes 'infinate' to 'infinit'. I'm not competent enough with regex to fix it. Rjwilmsi 19:16, 26 March 2007 (UTC)

Fixed, but I had to take out the case of "infinity".--BillFlis 19:33, 26 March 2007 (UTC)
Thanks. And another: ballon can't be corrected to balloon as 'ballon' exists in French and is quoted e.g. Ballon D'or in the Roberto Baggio article.
That sounds questionable since this is the English Wikipedia. That's one that would need to be rejected manually by the WP:AWB user but shouldn't be removed from the typo list. (My opinion anyway). —Wknight94 (talk) 21:21, 26 March 2007 (UTC)
Yes, but if you search for "ballon", you get not just Ballon D'Or but a host of articles with that word in the title. On the other hand, we could certainly keep the corrections of "balloning", "ballonist", etc. On the third hand, there aren't a lot of these errors.--BillFlis 10:24, 28 March 2007 (UTC)

[edit] 'responsable(s)' fix needs to be removed

Responsable(s) exists in French so needs to be removed from the "(Ir)Responsible" correction. Rjwilmsi 20:27, 27 March 2007 (UTC)

tPA is corected to TPa but it's correct in articles such as Serpin. Rjwilmsi 20:37, 27 March 2007 (UTC)

Sorry to push back again (as I did above) but this is the English Wikipedia. Shouldn't French words be occurring very very rarely? To me, that's better to cover as an exception by the WP:AWB user (which is what this list is for). —Wknight94 (talk) 22:03, 27 March 2007 (UTC)
While, I tend to agree, the RETF project page does state that the "lofty goal of RETF is to be completely automatic. That is, 100% accuracy." So something's got to give. We can't really have it both ways. I have a couple of ideas that I'm going to propose soon to alleviate this. --Thiseye 04:27, 28 March 2007 (UTC)
From that goal, anytime someone runs across any change in WP:AWB that they need to roll back, they should remove it from the list, right? I'll do that then. Thanks. —Wknight94 (talk) 11:21, 28 March 2007 (UTC)

For phrases in a language other than English, use {{lang}} for the phrase, for example {{lang|fr|Responsable}}, where the second parameter is the ISO 639 code. It stops AWB changing the text, but I'm not sure about WikEd (if not, it probably should). mattbr 10:53, 28 March 2007 (UTC)

Thanks. That's a really useful tip I didn't know about. I'll probably go through and tag all French 'responsable's like that. Rjwilmsi 17:25, 28 March 2007 (UTC)

[edit] Typica

Typica exists (in English!) but is corrected to Typical. Wasn't sure how to fix the regex myself. Rjwilmsi 07:03, 28 March 2007 (UTC)

I have removed the regex doing this ((A)Typically). Other changes in he removed regex appear to already be covered in (A)Typical, but someone please update it not. Thanks, mattbr 10:53, 28 March 2007 (UTC)

Another: In (fact/the/a/an) corrects the name Ina

Removed "ina" and "inan" from regex because of name false positives. I'd also be concerned "inan" would be a typo of "inane". --Thiseye 01:24, 29 March 2007 (UTC)

[edit] Nation name capitalization

What do folks think about taking out some of the capitalizations since there are so many animal species that use lower-case versions of words that would ordinarily be upper-case (see this edit for an example of the mistakes that are often made). —Wknight94 (talk) 22:03, 27 March 2007 (UTC)

"gum arabic" too. -- Euchiasmus 20:17, 7 April 2007 (UTC)

[edit] Millenium Hall

Proposing to remove "Millennium_" since there is a well-known 18th century book, Millenium Hall. —Wknight94 (talk) 00:06, 30 March 2007 (UTC)

There's a band called 'Agression', so the 'agression' -> aggression fix needs to be edited. Rjwilmsi 06:24, 31 March 2007 (UTC)

[edit] Official

There is currently an entry for Official, but I'm not sure if it corrects "Offical" --> "Official". Can someone either please add this or let me know that it is in there already? --After Midnight 0001 05:09, 1 April 2007 (UTC)

I added that case, as well as a couple more word endings.--BillFlis 11:17, 1 April 2007 (UTC)

[edit] .coms

I couldn't get negative lookahead to work properly on the .com's (OK, brainfart Harvard would be .edu anyway). Try 1 and Try 2. I'm trying to get it to ignore URLs and emails (ex NSAKEY). Can somebody take a peek? I was reloading the file with click/unclick of the RETF option. — RevRagnarok Talk Contrib 17:40, 1 April 2007 (UTC)

AWB ignores external http: links (and from the next release https:, ftp: and mailto:), so these shouldn't be a problem. In regular text, I can't think of a situation where you would write a web or email address outside a link. Could you point me to where you are having the problem? You can try out a regex using the find-and-replace option in AWB, and I don't think clicking/unclicking the checkbox reloads the list, but you can from the last option on the 'General' menu. mattbr 18:12, 1 April 2007 (UTC)
The developers told me click/unclick reloads and that seems to work. The test article is listed above - NSAKEY has the public key for an email @microsoft.com. — RevRagnarok Talk Contrib 18:18, 1 April 2007 (UTC)
Sorry missed that. Wrap the text in <pre></pre> rather than using a space at the beginning. AWB will then ignore them. mattbr 18:30, 1 April 2007 (UTC)
That fixes this case, but on a side note, I'd like to know why the regex didn't work. — RevRagnarok Talk Contrib 18:35, 1 April 2007 (UTC)
Ticking and unticking the box just enables and disables it, it doesnt refresh the typo list. I've just commited a change that if you use the option on the general menu, it will reload them. Reedy Boy 18:41, 1 April 2007 (UTC)
Two weeks ago you said it did reload the typo page. Guess there was a misunderstanding somewhere. Either way, I < pre> tagged the one spot anyway per Matt. — RevRagnarok Talk Contrib 18:52, 1 April 2007 (UTC)
Sorry about that, i thought (as it was a bit of a quick fix), that it did. When i looked over the code just now, i realised, that unless the decleration for the typo's was blank (ie = null), it wouldnt load them. I've now put a parameter on that, so that you can force reload, and that works. Sorry for the confusion/lack of complete attention on my part, and for the next release, it definately has been sorted!! Reedy Boy 19:01, 1 April 2007 (UTC)

Re the regex, sorry bit of a regex novice. Can anyone else help? mattbr 18:50, 1 April 2007 (UTC)

[edit] august > August

Since august is a word, should this correction be removed, or improved to fix <number> august > <number> August only? Rjwilmsi 17:53, 3 April 2007 (UTC)

Good point. Probably, but I was having some problems with lookahead in the past (see above). — RevRagnarok Talk Contrib 18:10, 3 April 2007 (UTC)

[edit] discribed -> described

As in [1]? Jobjörn (Talk ° contribs) 12:06, 4 April 2007 (UTC)

Added to "Describe", which is now "(De/Pre)scribe".--BillFlis 19:49, 4 April 2007 (UTC)

[edit] strengtened > strengthened

as here. Jobjörn (Talk ° contribs) 14:16, 4 April 2007 (UTC)

Added to "Strength".--BillFlis 19:43, 4 April 2007 (UTC)

[edit] "significatly" --> "significately" ???

The rule <Typo word="-(b/c/d/g/i/m/s/t/v)ately_" find="([bcdgimstv])atly\b" replace="$1ately" /> converts significatly to significately.

Surely that can't be what the inventor intended?

--Euchiasmus 20:13, 7 April 2007 (UTC)

Yeah, that needs to go away. —Wknight94 (talk) 21:19, 7 April 2007 (UTC)

[edit] "distictively" --> "districtively" ???

The word "districtively" doesn't even exist.

Let's have rules that rectify a recognised and bounded set of incorrect words, rather than trying to make the rules too general. What do you think? Euchiasmus 20:30, 7 April 2007 (UTC)

Agreed as your other significatly example demonstrates. —Wknight94 (talk) 21:19, 7 April 2007 (UTC)