Wikipedia talk:AutoWikiBrowser/Typos
From Wikipedia, the free encyclopedia
Archives |
[edit] Vitaly → Vitally
I recently ran AWB on Category:Soviet actors (123 articles) and encountered three false positives (Boris Babochkin, Vasily Livanov, and Vitaly Solomin) with "Vitaly" → "Vitally". While "vitaly" is probably a common misspelling of "vitally" (and, thus, the fix for it is useful), the fix could cause errors in articles about Russian people. Since names are likely to be written in upper-case, is there any way to restrict the change to lower-case instances of "vitaly" only? If not, is there some other way to reduce the potential for false positives while preserving the typo fix? Black Falcon (Talk) 06:29, 6 April 2008 (UTC)
- Hopefully fixed. TestPilot 10:20, 16 April 2008 (UTC)
- It seems to be working: I just tried AWB on the articles that produced the false positives and was not prompted for any typo fixes. Thanks! Black Falcon (Talk) 16:49, 16 April 2008 (UTC)
[edit] Capitalization in Wikipedia DNS
Noticed it changed wikipedia to Wikipedia in Wikiquote in the dns addresses listed there. Convention dictates they remain lowercase. - Kaobear (talk) 15:05, 8 April 2008 (UTC)
- Yeah, I agree, strings "wikipedia.org", "wikipedia.com", "wiktionary.org" and "microsoft.com" should not be capitalized - too many false positives. But, unfortunately, I don't know how to fix that. TestPilot 10:34, 16 April 2008 (UTC)
[edit] esp. --> especially
I was thinking:
<Typo word="especially" find="\b(Esp|esp)\.([ \t])\b" replace="$1ecially$2" />
..but am open to corrections... Ling.Nut (talk) 19:29, 26 April 2008 (UTC)
- Most likely it will be encountered in quotations, where it shouldn't be changed. MaxSem(Han shot first!) 19:57, 26 April 2008 (UTC)
[edit] WP:MOS fixes, such as "no spaces around mdashes"
Is there a reason why AWB doesn't do the more mechanical WP:MOS fixes? Ling.Nut (talk) 09:32, 28 April 2008 (UTC)
- These types of fixes can be proposed at Wikipedia talk:AutoWikiBrowser/Feature requests. I'm not sure whether a spacing fix could (or should) be incorporated into this page... Black Falcon (Talk) 20:15, 28 April 2008 (UTC)
- Thanks! Ling.Nut (talk) 02:12, 29 April 2008 (UTC)
[edit] Fix may be needed
While I was adding a nav box and also doing the general and typo fixes, AWB changed spelling of a word Succeeded from succedded to succeededd in preview. But when i checked using diffs after saving it was Succeeded. Can someone look at this problem? --SMS Talk 16:57, 28 April 2008 (UTC)
[edit] Souffle
It seems to change Souffle into Souffléouffl. Interesting word, but not strictly a correction... -- 20.133.0.13 (talk) 09:40, 29 April 2008 (UTC)
- Thanks, this has already been corrected [2] by BillFlis. Thanks Rjwilmsi (talk) 23:42, 30 April 2008 (UTC)
[edit] Entries to move hyphens to en dashes
Per WP:DASH, I'd like to add some entries here that will convert hyphens to en dashes. This is a bit of a departure, thopugh, so I wanted to discuss it first. I've tested these extensively, and not encountered any false positives (I have others that do have a lot of false positives, but I'm not adding them here).
<Typo word="en dash in page ranges" find="(pages\ ?=\ ?|pp\.?\ )([0-9]+)-([0-9]+)" replace="$1$2–$3" /> <Typo word="en dash in date ranges" find="(\[?\[?(January|February|March|April|May|June|July|August|September|October|November|December)\ [1-3]?[0-9]\]?\]?,\ \[?\[?[1-2][0-9][0-9][0-9]\]?\]?)\ ?-\ ?(\[?\[?(January|February|March|April|May|June|July|August|September|October|November|December)\ [1-3]?[0-9]\]?\]?,\ \[?\[?[1-2][0-9][0-9][0-9]\]?\]?)" replace="$1–$3" /> <Typo word="en dash in money ranges" find="(\$[1-9]?[0-9]?[0-9]?[0-9])\ ?-\ ?(\$?[1-9]?[0-9]?[0-9]?[0-9])" replace="$1–$2" /> <Typo word="en dash in measurement ranges" find="([1-9]?[0-9])\ ?-\ ?([1-9]?[0-9])(\ |\ )(years|months|weeks|days|hours|minutes|seconds|kg|mg|kb|km|GHz|Hz|kHz|miles|mi\.|%|MPH|mph)\b" replace="$1–$2$3$4" <Typo word="en dash in time ranges" find="([0-1]?[0-9]:[0-5][0-9]\ ?([AaPp][Mm])?)\ ?-\ ?([0-1]?[0-9]:[0-5][0-9]\ ?([AaPp][Mm])?)" replace="$1–$3" /> <Typo word="en dash in age ranges" find="([Aa]ge[sd])\ ([1-9]?[0-9])\ ?-\ ?([1-9]?[0-9])" replace="$1 $2–$3 />
So let me know what you think...—Chowbok ☠ 17:29, 6 May 2008 (UTC)
- Since Wikipedia is now UTF-8–compatible, why don't you replace the hyphens with the single en-dash character "–", rather than the lame old HTML entity "& n d a s h ;", which takes up seven times the space?--BillFlis (talk) 17:40, 6 May 2008 (UTC)
- Because the edit box is (for most people) in a monospaced font, which makes it impossible to tell the difference between a hyphen, an en dash, and an em dash. You'll also note that the dash characters are not converted to UTF-8 automatically by AWB, for the same reason.—Chowbok ☠ 17:45, 6 May 2008 (UTC)
- I'm with BillFlis in preferring that the single character be used rather than the html entity. If AWB can only support the html entity, I'd rather not see this implemented. older ≠ wiser 18:05, 6 May 2008 (UTC)
- Sigh. Did you read what I just wrote? At least try to address my point...—Chowbok ☠ 18:23, 6 May 2008 (UTC)
- I don't really see why the monospaced font display is an issue. I venture that most editors could care less about the difference and we shouldn't be unnecessarily filling the edit screen with techno-jargon. If AWB is unable to make the distinction, I don't think we should be using AWB to implement such a "solution". older ≠ wiser 18:30, 6 May 2008 (UTC)
- AWB is capable of putting in the UTF-8 character, I'm not sure how you got that it isn't. Anyway, the monospaced font is very much an issue, and editors that know the difference between the dashes absolutely need to be able to see which has been implemented. It's ridiculous to say that it's not a big deal that commonly-confused characters look identical in the edit box.—Chowbok ☠ 18:34, 6 May 2008 (UTC)
- A bad assumption perhaps because it is rather inconceivable why anyone would want to clutter the articles up with html entities when there is a perfectly good UTF character available. If it is so very important for editors to be able to distinguish them, then why does the MOS makes no mention whatsoever of the distinction let alone indicate any sort of preference. Now that you indicate AWB is capable of inserting the UTF character, then I very very strongly oppose having it insert the cludgey html entity. older ≠ wiser 19:02, 6 May 2008 (UTC)
- I don't see why it's "inconceivable" when I've explained it several times now. The reason is that editors need to be able to see if something is a hyphen, en dash, or em dash when editing an article. The advantage of doing it this way is that it allows that. The disadvantage is that you think HTML entities are ugly. Sorry, I'm not convinced that's the better argument.—Chowbok ☠ 19:22, 6 May 2008 (UTC)
- Well, if as you say, it is so important to see the distinction, then why is the MOS and other editing guidelines silent on this point? If it is simply a matter of your preference vs. mine, that is certainly something that should be more widely discussed before encoding it into AWB. older ≠ wiser 19:41, 6 May 2008 (UTC)
- I don't see why it's "inconceivable" when I've explained it several times now. The reason is that editors need to be able to see if something is a hyphen, en dash, or em dash when editing an article. The advantage of doing it this way is that it allows that. The disadvantage is that you think HTML entities are ugly. Sorry, I'm not convinced that's the better argument.—Chowbok ☠ 19:22, 6 May 2008 (UTC)
- A bad assumption perhaps because it is rather inconceivable why anyone would want to clutter the articles up with html entities when there is a perfectly good UTF character available. If it is so very important for editors to be able to distinguish them, then why does the MOS makes no mention whatsoever of the distinction let alone indicate any sort of preference. Now that you indicate AWB is capable of inserting the UTF character, then I very very strongly oppose having it insert the cludgey html entity. older ≠ wiser 19:02, 6 May 2008 (UTC)
- AWB is capable of putting in the UTF-8 character, I'm not sure how you got that it isn't. Anyway, the monospaced font is very much an issue, and editors that know the difference between the dashes absolutely need to be able to see which has been implemented. It's ridiculous to say that it's not a big deal that commonly-confused characters look identical in the edit box.—Chowbok ☠ 18:34, 6 May 2008 (UTC)
- I don't really see why the monospaced font display is an issue. I venture that most editors could care less about the difference and we shouldn't be unnecessarily filling the edit screen with techno-jargon. If AWB is unable to make the distinction, I don't think we should be using AWB to implement such a "solution". older ≠ wiser 18:30, 6 May 2008 (UTC)
- Sigh. Did you read what I just wrote? At least try to address my point...—Chowbok ☠ 18:23, 6 May 2008 (UTC)
- I'm with BillFlis in preferring that the single character be used rather than the html entity. If AWB can only support the html entity, I'd rather not see this implemented. older ≠ wiser 18:05, 6 May 2008 (UTC)
- Because the edit box is (for most people) in a monospaced font, which makes it impossible to tell the difference between a hyphen, an en dash, and an em dash. You'll also note that the dash characters are not converted to UTF-8 automatically by AWB, for the same reason.—Chowbok ☠ 17:45, 6 May 2008 (UTC)
This certainly seems like a worthwhile fix for AWB to do, but I think it would be better as an AWB general fix so it's available to all AWB users, not just those doing typo fixing. Therefore I suggest you post it at Wikipedia talk:AutoWikiBrowser/Feature requests. Thanks Rjwilmsi (talk) 17:52, 6 May 2008 (UTC)
Just beneath the edit window are all these special characters, which an editor can simply click on to insert. Guess what's the very first one? An en-dash character. The second is an em-dash. If we're not supposed to use them, then why are they there?--BillFlis (talk) 22:28, 6 May 2008 (UTC)
- I'm not saying it's a policy to use the entities, just good practice. Let me ask you and Bkonrad a question. Suppose I'm editing a page by hand, and I see 1941—1945 in the edit box. What should I do to quickly determine if the correct dash is being used?—Chowbok ☠ 18:42, 7 May 2008 (UTC)
- Hmm, well just eyeballing it in my edit window it looks to me like an endash. And confirmed by using Firefox's search function. older ≠ wiser 19:29, 7 May 2008 (UTC)
- Put a hyphen, an em dash, and an en dash in an edit window. Assuming you're using a monospaced font, I guarantee two of those will be identical.—Chowbok ☠ 22:21, 7 May 2008 (UTC)
- Yep, I did that. I did nothing special to configure Firefox. The difference between them was pretty easy to spot. older ≠ wiser 00:36, 8 May 2008 (UTC)
- Well, I don't know what font you're using, but in Courier, these look the same:
- –
- —
- —Chowbok ☠ 03:01, 8 May 2008 (UTC)
- Hmm, I misspoke. 1st, when in response to your example of 1941—1945, I said it looked like an endash in the edit window, but on more careful examination it is an mdash. 2nd, a regular hyphen and an ndash do appear identical in the edit window (immediately above, you show an endash and and mdash which are clearly different, even to my not particularly acute vision. But the Firefox search function does find the correct characters. But in any case, you have not responded to my query about why, if it is so important for editors to be able to make this distinction (based on using the html entities), is no mention made of it in the MOS or other editing guidelines? older ≠ wiser 12:23, 8 May 2008 (UTC)
- Yep, I did that. I did nothing special to configure Firefox. The difference between them was pretty easy to spot. older ≠ wiser 00:36, 8 May 2008 (UTC)
- Put a hyphen, an em dash, and an en dash in an edit window. Assuming you're using a monospaced font, I guarantee two of those will be identical.—Chowbok ☠ 22:21, 7 May 2008 (UTC)
- Hmm, well just eyeballing it in my edit window it looks to me like an endash. And confirmed by using Firefox's search function. older ≠ wiser 19:29, 7 May 2008 (UTC)
[edit] suggesting a change
Here, AWB changed "reciding" to "resideing". Using the link to Dictionary.com on User:Mboverload/RegExTypoFix/rejectedwords, the word "resideing" is not a real word. I suggest that the typo fix for "reciding" be changed to "residing".--Rockfang (talk) 12:33, 7 May 2008 (UTC)
[edit] Of fornames and feilds
It's currently changing "forname" to "oref" and "feilding" to "field$S" which, while both interesting words, are probably not correct; can someone who understands these things fix it? — iridescent 20:57, 8 May 2008 (UTC)
- I fixed the "feilding" one. Someone else is going to have to tackle the other one.--BillFlis (talk) 23:01, 8 May 2008 (UTC)
- I think I've now fixed the forname one too, but I can't test it properly until this evening. Rjwilmsi (talk) 08:52, 16 May 2008 (UTC)
-
- It's still doing this - could someone have another look (or remove it from the regex entirely as an interim measure)? Thanks! — iridescent 20:50, 17 May 2008 (UTC)
- The typo list is fixed e.g. [3] but there is a bug with AWB in that the released version is stuck loading some old version of the typo list. I reported the bug, it's been fixed in the SVN version, but no new official update has been released. I have re-requested a new release - Wikipedia_talk:AutoWikiBrowser/Dev#Release_next_version_please. I would suggest you politely petition the developers for a release to fix this! Thanks Rjwilmsi (talk) 22:04, 17 May 2008 (UTC)
- And to clarify, even if you remove the entry from the typo list (which isn't necessary as it's now correct), the released version of AWB will not pick up the new version of the typo list! Rjwilmsi (talk) 22:06, 17 May 2008 (UTC)
- It's still doing this - could someone have another look (or remove it from the regex entirely as an interim measure)? Thanks! — iridescent 20:50, 17 May 2008 (UTC)
[edit] Jewelery
Can someone remove "jewellery" → "jewelery" from the typo list? "Jewelery" is an Americanism; in the rest of the world the correct spelling is with two l's. — iridescent 16:16, 11 May 2008 (UTC)
- Merriam Webster says 'jewelery' isn't a word [4] (jewellery is the British version, jewelry is the American one), so which of these do you think is wrong?
<Typo word="jewellery" find="\b(J|j)ewelery\b" replace="$1ewellery"/> <Typo word="Jewelery" find="\b(J|j)ewl(|le)ry\b" replace="$1ewel$2ry" />
Thanks Rjwilmsi (talk) 17:38, 11 May 2008 (UTC)
The second is correct as it replaces "jewlery" which definitely isn't a word to "jewelry"; the first should go as it just converts British to American english. I'd add one to convert "jewllery" to "jewellery", too. — iridescent 17:45, 11 May 2008 (UTC)
Ignore me (I can never understand regexes) - the corrections should be "jewllery" to "jewellery", "jewelery" to "jewelry" and "jewlery" to "jewelry". I think. — iridescent 17:48, 11 May 2008 (UTC)
- Okay, 1 is already corrected, 2 is actually corrected to jewellery which I think is better so haven't changed and 3 is now corrected. Thanks Rjwilmsi (talk) 18:21, 11 May 2008 (UTC)
[edit] Targetting and Targetted...
are perfectly fine in British, Canadian and other kinds of English. Please remove them from the list asap. --Slp1 (talk) 21:19, 11 May 2008 (UTC)
- Do you have a link to support this – targetted etc. are not listed with double ts at wiktionary, Merriam Webster nor Dictionary.com. Thanks Rjwilmsi (talk) 21:47, 11 May 2008 (UTC)
-
-
- well, well. How very, very interesting. I have to confess that I can't find any. The OED does include examples of the double tt, but from centuries ago. However major media such as the BBC,[5][6] CBC, [7] Globe and Mail,[8] reputable publishers [9] [10] and scholarly journals [11][12], all use the spelling regularly. It is a fascinating example of dictionaries as prescriptive rather than descriptive. I wonder how long dictionaries can possibly continue not to include it as a frequently used variant given its wide use by reputable sources. What do you guys do in situations like this? I guess it is probably desirable to stick to what dictionaries say, no matter how extensively the variant is used, but I do think some care needs to be taken: there are a number of books [13][14] and articles [15] [16], for example, that use the "incorrect" spelling, and fixing them as typos would not be right obviously. --Slp1 (talk) 14:32, 12 May 2008 (UTC)
-
[edit] Remove references to "Encyclopedia of Cajun Culture"?
I believe that several Wikipedia entries cite my personal web site, the Encyclopedia of Cajun Culture, located at www.cajunculture.com, as a source of information.
However, I have discontinued the Encyclopedia of Cajun Culture and now use the domain in question for other purposes.
As such, could someone create an AWB that would remove all references in Wikipedia to my website, whether it's listed as "Encyclopedia of Cajun Culture" or as "www.cajunculture.com" or even some combination of the two? (I manually deleted one such reference, which included not only "Encyclopedia of Cajun Culture" and "www.cajunculture.com", but also my personal name and that of my co-author.)
Sincerely, --Skb8721 (talk) 01:19, 14 May 2008 (UTC)
- Your request sounds reasonable if the website's content is now not relevant to the articles in which it is referenced. However, this is the talk page for typo fixing, so I suggest you re-post your request on the appropriate page – the AWB talk page. Thanks Rjwilmsi (talk) 09:17, 14 May 2008 (UTC)
[edit] Ukulele
Can someone remove "ukelele"→"ukulele" from the regex please? "Ukelele" is the correct spelling in British English, and an acceptable variant in the US Thanks! — iridescent 01:33, 14 May 2008 (UTC)
- fixed – wiktionary agrees that ukelele is a valid variant – wikt:ukelele. Rjwilmsi (talk) 09:29, 14 May 2008 (UTC)
[edit] Compleat vs. Complete
In the past two months, there have been three edits to the "Weird Al" Yankovic page using AWB that say the word 'Compleat' was changed to 'Complete'. Looking at the diffs, the first time the word was actually changed (said change was reverted); the last two times it wasn't. 'Compleat' is not a misspelling and should not be treated as such, whether or not a change is actually made. My regex skillz are not enough to correct this myself or I would. Hopefully some kind soul can help out.
Here are the diffs in question:
- http://en.wikipedia.org/w/index.php?title=%22Weird_Al%22_Yankovic&diff=215604728&oldid=215311246
- http://en.wikipedia.org/w/index.php?title=%22Weird_Al%22_Yankovic&diff=210886176&oldid=210722790
- http://en.wikipedia.org/w/index.php?title=%22Weird_Al%22_Yankovic&diff=202596617&oldid=202596373
-- BullWikiWinkle 02:40, 29 May 2008 (UTC)
- There's a known bug in AWB that explains the second and third cases. I've added an exception to the typo list so 'Compleat' will not be changed in the future. Thanks Rjwilmsi (talk) 07:07, 29 May 2008 (UTC)
[edit] iii, www, xxx
Not sure what's changed with the 3 letters → 2 letters rule, but can exemptions be made for iii,xxx and www? At the moment AWB's attempting to shorten Roman numerals & website addresses. Thanks... — iridescent 19:04, 29 May 2008 (UTC)
- Yes, I thought I'd added exceptions for iii and www already. I've tweaked them and added xxx [18] please refresh status in AWB to test. Thanks Rjwilmsi (talk) 20:21, 29 May 2008 (UTC)
[edit] legitmacy → llegitimacy
I was recently swapping over some templates when I came across Mid-Sha'ban. AWB suggested a change of legitmacy → llegitimacy. I wasn't sure if this was a bug or not.--Rockfang (talk) 17:13, 30 May 2008 (UTC)
- You're very polite to only suggest that this might be a bug ;) It is, and I've fixed the erroneous entry. Thanks Rjwilmsi (talk) 23:09, 30 May 2008 (UTC)
[edit] Illinios -> Illinois
Could the spell check fix this?--DAW0001 (talk) 13:14, 6 June 2008 (UTC)
<Typo word="Illinois" find="\b(?:[Ii]l(?:[li]a?noi|ll+[ai]noi?|l+[ai]ni?o|l+ioni)s|illinois)\b" replace="Illinois" />
[edit] recieve -> receive
Could the spell check fix this?--DAW0001 (talk) 13:14, 6 June 2008 (UTC)
<Typo word="(Re/De/(Mis/Pre)Per/(Mis)Con/Trans)ceive" find="\b([RrDd]e|[Pp]er|[Mm]isper|[Cc]on|[Mm]iscon|[Pp]recon|[Tt]rans)ce?iev(e[sd]?|ers?|ing|ership|ables?)\b" replace="$1ceiv$2" />